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FOREWORD 


No single line of development leads to the intersection of digital methods 
and scholarship in the humanities. The mid-twentieth-century activity of 
Roberto Busa is frequently cited in origin stories of digital humanities. His 
collaboration with IBM was undertaken when Thomas Watson saw the 
potential for automating the Jesuit scholar’s concordance of the works of 
St Thomas Aquinas. 

But instruments essential to automation include intellectual and tech- 
nological components with much longer histories. These connect to such 
mechanical precedents as the Pascaline, a calculating device named for its 
seventeenth-century inventor, the philosopher Blaise Pascal, and to the 
finely crafted Jacquard punch cards used to direct shuttle movements in 
the nineteenth-century textile industry. In addition, a wide array of systems 
of formal logic, procedural mathematics and statistical methods developed 
over centuries have provided essential foundations for computational 
operations. 

Many features of contemporary networked scholarship can be tracked 
to earlier information systems of knowledge management and even ancient 
strategies of record-keeping. Scholarly practices were always mediated 
through technologies and infrastructures whether these were hand-copied 
scrolls and codices, shelving systems and catalogues or other methods 
for search, retrieval and reproduction. The imprints of Babylonian grids, 
arithmetic visualisations, legacy metadata and classification schemes remain 
present across contemporary knowledge work. Now that the role of 
digital technology in scholarship has become conspicuous in any and every 
domain, the line between technological and humanistic domains is some- 
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times hard to discern in our daily habits. Even the least computationally 
savvy scholar regularly makes use of digital resources and activities. Of 
course, discrepancies arise cross cultures and geographies, and certainly 
not all scholarship in every community exists in the same networked 
environment. Plenty of ‘traditional’ scholarship continues in direct contact 
with physical artefacts of every kind. But much of what has been considered 
‘the humanities’ within the long traditions of Western and Asian culture 
now functions on foundations of digitised infrastructure for access and 
distribution. To a lesser degree, it also functions with the assistance of 
computational processing that assists not only search and retrieval, but also 
analysis and presentation in quantitative and graphical display. 

As we know, familiar technologies tend to become invisible as they 
increase in efficiency. The functional device does not, generally, call atten- 
tion to itself when it performs its tasks seamlessly. The values of technolog- 
ical optimisation privilege these qualities as virtues. Habitual consumers of 
streaming content are generally disinterested in a Brechtian experience of 
alienation meant to raise awareness of the conditions of viewing within 
a cultural-ideological matrix. Such interventions would be tedious and 
distracting and only lead to frustration whether part of an entertainment 
experience or a scholarly and pedagogical one. Consciousness raising 
through aesthetic work has gone the way of the early twentieth-century 
avant-garde ‘slap in the face of public taste’ and other tactics to shock the 
bourgeoisie. The humanities must stream along with the rest of content 
delivered on demand and in consumable form, preferably with special 
effects and in small packets of readily digested material. Immersive Van 
Gogh and Frida Kahlo exhibits have joined the traditional experience of 
looking at painting. The expectations of gallery viewers are now hyped 
by theme park standards. The gamification of classrooms and learn- 
ing environments caters to attention-distracted participants. Humanities 
research struggles in such a context even as knowledge of history, ancient, 
indigenous and classical languages and expertise in scholarly methods such 
as bibliography and critical editing are increasingly rare. 

Meanwhile, debates in digital humanities have become fractious in 
recent years with ‘critical’ approaches pitting themselves against earlier 
practices characterised as overly positivistic and reductive. Pushback and 
counterarguments have split the field without resulting in any substantive 
innovation in computational methods, just a shift in rhetoric and claims. 
Algorithmic bias is easier to critique than change, even if advocates 
assert the need to do so. The question of whether statistical and formal 
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methods imported from the social sciences can adequately serve humanistic 
scholarship remains open and contentious several decades into the use 
of now-established practices. Even exuberant early practitioners rarely 
promoted digital humanities as a unified or salvific approach to scholarly 
transformation, though they sometimes adopted computational techniques 
somewhat naively, using programs and platforms in a black box mode, 
unaware of their internal workings. 

The instrumental aspects of digital infrastructure are only one topic 
of current dialogues. From the outset of my own encounter with digital 
humanities, sometime in the mid-1990s, what I found intriguing were the 
intellectual exigencies asserted as a requirement for working within the 
constraints of these formal systems. While all of this has become normative 
in the last decades, a quarter of a century ago, the recognition that much 
that had been implicit in humanities scholarship had to be made explicit 
within a digital context produced a certain frisson of excitement. The 
task of rethinking how we thought, of thinking in relation to different 
requirements, of learning to imagine algorithmic approaches to research, 
to conceptualise our explorations in terms of complex systems, emergent 
properties, probabilistic results and other frameworks infused our research 
with exciting possibilities. 

As I have noted, much of that novelty has become normative—no longer 
called self-consciously to attention—just as awareness of the interface is lost 
in the familiar GUI screens. Still, new questions did get asked and answered 
through automated methods—mainly benefits at scale, the ‘reading’ of 
a corpus rather than a work, for instance. Innovative research crosses 
material sciences and the humanities, promotes non-invasive archaeology 
and supports authentification and attribution studies. Errors and mistakes 
abound, of course, and the misuse of platforms and processes is a regular 
feature of bad digital humanities—think of all those network diagrams 
whose structure is read as semantic even though it is produced to optimise 
screen legibility. Misreading and poor scholarship are hardly exclusive to 
digital projects even if the bases for claims to authority are structured 
differently in practices based on human versus machine interpretation. 

Increased sophistication in automated processes (such as named entity 
recognition, part of speech parsing, visual feature analysis etc.) continues 
to refine results. But the challenge that remains is to learn to think in and 
with the technological premises. A digital project is not just an automated 
version of a traditional project; it is a project conceived from the outset 
in terms that structure the problems and possible outcomes according 
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to what automated and computational processes enable. Using statistical 
sampling methods is not a machine-supported version of serendipity or 
chance encounter—it is a structurally and intellectually different activity. 
Photoshop is not just a camera on steroids; it is a means of abstracting 
visual information into discrete components that can be manipulated in 
ways that were not conceptually or physically possible in wet darkrooms. 
Similarly, other common programs like Gephi, Cytoscape, Voyant and 
Tableau contain conceptual features un-thought and unthinkable in ana- 
logue environments—but they need to be engaged with an understanding 
of what those features allow. 

Detractors scoff, sceptics cringe and the naysayers of various critical 
stripes protest that all of this aligns with various agendas—political, neolib- 
eral, free market or whatever—as if intellectual life had ever been free 
of conditions and contexts. Where were those pure humanities scholars 
of a bygone era? Working for the Church? The State? Elite universities? 
Administrative units within the legal systems of national power structures? 
The science labs that hatched nefarious outcomes in the name of pure 
research? Finger pointing and head wagging will not change the reality that 
the humanities have always been integrated into civilisations and cultures 
to serve partisan agendas and hegemonic power structures. Poetics and 
aesthetics provide insight into the conditions from which they arise; they 
are not independent of it. We no longer subscribe to the tenets of Matthew 
Arnold’s belief in the ‘best that has been made and thought’ of human 
expression contributes to moral uplift and improvement. Everything is 
complicit. Digital humanities is hardly the first or likely to be the last 
instrument of exclusivity or oppression—as well as liberation and social 
progress. 

The labour of scholarship continues along with the pedagogy to sustain 
it. This activity imprints many values and judgements in the materials and 
methods on which it proceeds. Basic activities like classification model the 
way objects are found and identified. Crucial decisions about digitisation— 
size, scale, quality and source—affect what is presented for study. Terms of 
access and use create hierarchies among communities, some of whom have 
more resources than others. In short, at every point in the chain of interre- 
lated social and material activities that create digital assets, implementation 
and intellectual implications are combined. The charge to address social ills 
and inequities freights projects in digital humanities with tasks of reparation 
and redress, asking that it bears the weight of an entire agenda of social 
justice. The relation between ethics and application to digital work raises 
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more institutional resource and epistemological issues than technical ones. 
Fairness requires equal opportunity for skilled production and access, as 
well as a share in the interpretative discourse. No substitute exists for doing 
the work and having the educational and material resources to do it. The 
first step in transforming a field is the choice to acquire its competencies. 
Ignorant critique is as pernicious and ineffectual as unthinking practice. 

Lorella Viola’s argument about the current state of digital scholarship 
is meant to shift the frameworks for understanding these issues. Historical 
tensions are evident in the way her subtitle frames her work as ‘Beyond 
Critical Digital Humanities’. Acknowledging debates that have often pitted 
first wave digital humanists against later critics, she positions her own 
research as ‘post-authentic’ by contrast. This term signals dismissal of the 
last shred of belief that digital and computational techniques were value 
neutral or promoted objectivity (a position taken only by a fraction of 
practitioners). But it also distances her from the standard ‘critique’ of 
these methods—that they are tools of a neoliberal university environment 
promoting entrepreneurial approaches to scholarship at the expense of 
some other not very clearly articulated alternative (another very worn out 
line of discussion). 

Keen to move beyond all this, Viola advocates ‘symbiotic’ and ‘mutual- 
ist? as concepts that eschew many old binarisms and disciplinary bound- 
aries. While acknowledging the range of work on which she herself 
is building, and the historical development of positions and counter- 
positions, she seeks to integrate critical principles into digital methods and 
projects from within her own experience of their practice. Her work is 
grounded in knowledge of text analysis and computational linguistics as 
well as interface design and visualisation. While her summary of polarised 
positions forms the opening section of this book, and underpins much of 
what she offers as an alternative, her vision of the way forward is synthetic 
and affirmative. Throughout, she invokes a post-authentic framework 
that emphasises critical engagement with digital operations as mediations. 
She frequently reiterates the points that geo-coding or sentiment analysis 
works within the dominant power structures that privilege Anglo-centric 
approaches and English language materials. Such biases are in part due 
to the historical site in which the work arose. Certain environments have 
more resources than others. The issue now is to create opportunities for 
transformation. 

The larger assertions of Viola’s project are crucial: that artificial bina- 
risms that pit traditional /analogue and computational/digital approaches 
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against each other, critical methods against technical ones and so on are 
distractions from the core issues: How is humanities scholarship to proceed 
ahead? What intellectual expertise is required to work with and read into 
and through the processes and conditions in which we conceptualise the 
research we do? 

Digital objects and computational processes have specific qualities that 
are distinct from those of analogue ones, and learning to think in those 
modes is essential to conceptualising the questions these environments 
enable, while recognising their limitations. Thinking as an algorithm, not 
just ‘using’ technology, requires a shift of intellectual understanding. But 
knowledge of disciplinary domains and traditions remains a crucial part 
of the essential expertise of the scholar. Without subject area expertise, 
humanities research is pointless whether carried out with digital or tradi- 
tional methods. As long as human beings are the main players—agents or 
participants—in humanities research, no substitute or surrogate for that 
expertise can arise. When that ceases to be the case, these questions and 
debates will no longer matter. No sane or ethical humanist would hasten 
the arrival of the moment when we cease to engage in human discourse. 
No matter how much agency and efficacy are imagined to emerge in the 
application of digital methods, or how deeply we may come to love our 
robo-pets and AlI-bot-assistants, the humanities are still intimately and 
ultimately linked to human experience of being in the world. Finally, the 
challenge to infuse computational methods with humanistic values, such 
as the capacity to tolerate ambiguity and complexity, remains. What, after 
all, is a humanistic algorithm, a bias-sensitive digital format or a self- 
conscious interface? What interventions in the technology would result in 
these transformations? 

Lorella Viola has much to say on these matters and works from experi- 
ence that combines hands-on engagement with computational methods 
and a critical framework that advances insight and understanding. So, 
machine and human readers, turn your attention to her text. 


Los Angeles, CA, USA Johanna Drucker 
May 2022 


PREFACE 


In 2016, the Los Angeles Review of Books (LARB) conducted a series of 
interviews with both scholars and critics of digital humanities titled ‘The 
Digital in the Humanities’. The aim of the special interview series was 
‘to explore the intersection of the digital and the humanities’ (Dinsman 
2016) and the impact of that intersection on teaching and research. As I 
was reading through the various interviews collected in the series, there 
was something that recurrently caught my attention. Despite an extensive 
use of terminology that attempted to communicate ideas of unity—for 
example, digital humanities is described as a field that ‘melds computer 
science with hermeneutics’—it gradually became obvious to me how the 
traditional rigid notions of separation and dualism that characterise our 
contemporary model of knowledge creation were creeping in, surrepti- 
tiously but consistently. The following excerpt from the editorial to the 
special issue provides a good example (emphasis mine): 


“digital humanities” seems astoundingly inappropriate for an area of study 
that includes, on one hand, computational research, digital reading and 
writing platforms, digital pedagogy, open-access publishing, augmented 
texts, and literary databases, and on the other, media archaeology and theories 
of networks, gaming, and wares both hard and soft. (ibid.) 


Language is see-through. It is the functional description of our mental 
models, that is, it expresses our conceptual understanding of the world. 
In the example above, the use of the construction ‘on one hand...on 
the other’ conveys an image of two distinct, contrasting polarised enti- 
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ties which are essentially antithetical in their essence and which despite 
intersecting—as it is said a few lines below—remain fundamentally separate. 
This description of digital humanities reflects a specific mental model, 
that knowledge is made up of competences delimited by established, 
disciplinary boundaries. Should there be overlapping spaces, boundaries 
do not dissolve nor they merge; rather, disciplines further specialise and 
create yet new fields. 

When the COVID-19 pandemic was at its peak, I was spending my 
days in my loft in Luxembourg and most of my activities were digital. I 
was working online, keeping contact with my family and friends online, 
watching the news online and taking online courses and online fitness 
classes. I even took part in an online choir project and in an online pub 
quiz. Of course for my friends, family and acquaintances, it was not much 
different. As the days became weeks and the weeks became months and 
then years, it was quite obvious that it was no longer a matter of having 
the digital in our lives; rather, now everyone was in the digital. 

This book is titled “The Humanities in the Digital’ as an intentional 
reference to the LARB’s interview series. With this title, I wanted to mark 
how the digital is now integral to society and its functioning, including how 
society produces knowledge and culture. The word order change wants 
to signal conclusively the obsolescence of binary modulations in relation 
to the digital which continue to suggest a division, for example, between 
digital knowledge production and non-digital knowledge production. Not 
only that. It is the argument of this book that dual notions of this kind are 
the spectre of a much deeper fracture, that which divides knowledge into 
disciplines and disciplines into two areas: the sciences and the humanities. 
This rigid conceptualisation of division and competition, I maintain, is 
complicit of having promoted a narrative which has paired computational 
methods with exactness and neutrality, rigour and authoritativeness whilst 
stigmatising consciousness and criticality as carriers of biases, unreliability 
and inequality. 

This book argues against a compartmentalisation of knowledge and 
maintains that division in disciplines is not only unhelpful and conceptually 
limiting, but especially after the exponential digital acceleration brought 
about by the 2020 COVID-19 pandemic, also incompatible with the 
current reality. In the pages that follow, I analyse many of the different 
ways in which reality has been transformed by technology—the pervasive 
adoption of big data, the fetishisation of algorithms and automation and 
the digitisation of education and research—and I argue that the full 
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digitisation of society, already well on its way before the COVID-19 
pandemic but certainly brought to its non-reversible turning point by the 
2020 health crisis, has added levels of complexity to reality that our model 
of knowledge based as it is on single-discipline perspective can no longer 
explain. With this book, my intention is to have the necessary conversation 
that the historical moment demands. 

The book is therefore primarily a reflection on the separation of knowl- 
edge into disciplines and of disciplines into the sciences vs the humanities 
and discusses its contemporary relevance and adequateness in relation to 
the ubiquitous impact of digital technologies on society and culture. In 
arguing in favour of a reconfigured model of knowledge creation in the 
digital, I propose different notions, practices and values theorised in a novel 
conceptual and methodological framework, the post-authentic framework. 
This framework offers a more complex and articulated conceptualisation 
of digital objects than the one found in dominant narratives which reduce 
them to mere collections of data points. Digital objects are understood 
as living compositions of humans, entities and processes interconnected 
according to various modulations of power embedded in computational 
processes, actors and societies. Countless versions can be created through 
such processes which are shaped by past actions and in turn shape the 
following ones; thus digital objects are never finished nor they can be 
finished ultimately transcending traditional questions of authenticity. 

Digital objects act in and react to society and therefore bear con- 
sequences; the post-authentic framework rethinks both products and 
processes which are acknowledged as never neutral, incorporating external, 
situated systems of interpretation and management. Taking the humanities 
as a focal point, I analyse personal use cases in a variety of applied contexts 
such as digital heritage practices, digital linguistic injustice, critical digital 
literacy and critical digital visualisation. I examine how I addressed in my 
own work issues in digital practice such as transparency, documentation 
and reproducibility, questions about reliability, authenticity, biases, ambi- 
guity and uncertainty and engaging with sources through technology. I 
discuss these case examples in the context of the novel conceptual and 
methodological framework that I propose, the post-authentic framework. 
By recognising the larger cultural relevance of digital objects and the 
methods to create them, analyse them and visualise them, throughout the 
chapters of the book, I show how the post-authentic framework affords 
an architecture for issues such as transparency, replicability, Open Access, 
sustainability, data manipulation, accountability and visual display. 
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The Humanities in the Digital ultimately aims to address the increasingly 
pressing questions: how do we create knowledge today? And how do we 
want the next generation of students to be trained? Beyond the rigid model 
of knowledge creation still fundamentally based on notions of separation 
and competition, the book shows another way: knowledge creation in the 
digital. 


Esch-sur-Alzette, Luxembourg Lorella Viola 
July 2022 


ACKNOWLEDGEMENTS 


The Humanities in the Digital is published Open Access thanks to sup- 
port from three funding schemes: the Fonds National de la Recherche 
Luxembourg (FNR)—RESCOM: Scientific Monographs scheme, the Lux- 
embourg Centre for Contemporary and Digital History (C7 DH)—Digital 
Research Axis Fund and the C?DH Open Access Fund. My sincerest 
thanks for funding this book project. Your support accelerates discovery 
and creates a fairer access to knowledge that is open to all. 

The research carried out for this book was supported by FNR. Parts 
of the use cases illustrated in the book, including the conceptual work 
done towards developing the interface for topic modelling illustrated in 
Chap. 5, stem from research carried out within the project: DIGITAL HIS- 
TORY ADVANCED RESEARCH PROJECTS ACCELERATOR—DHARPA. 
The discussions about network analysis and sentiment analysis and the 
interface examples of DeXTER described in Chap.5 are the result of 
research carried out within the THINKERING GRANT which was awarded 
to me by the C?DH. 

I would like to express my deepest gratitude to Sean Takats for his 
support and continuous encouragement during these years; without it, 
writing this book would have been much harder. A big thank you to the 
DHARPA project as a whole which has been instrumental for elaborating 
the framework proposed in this book. I have been really lucky to be part 
of it. 

I warmly thank Mariella de Crouy Chanel, my exchanges with you 
helped me identify and solve challenges in my work. Outside of official 


xvii 


xviii © ACKNOWLEDGEMENTS 


meetings, I have also much enjoyed our chats during lunch and coffee 
breaks. A very great thank you to Sean Takats, Machteld Venken, Andreas 
Musolff, Angela Cunningham, Joris van Eijnatten and Andreas Fickers for 
taking the time to comment on earlier versions of this book; thanks for 
being my critical readers. And thank you to the C?DH; in the Centre, I 
found fantastic colleagues and impressive expertise and resources. I could 
have not asked for more. 

I am deeply grateful to Jaap Verheul and Joris van Eijnatten for providing 
me with invaluable intellectual support and guidance when I was taking my 
first steps in digital humanities. You have both always shown respect for my 
ideas and for me as a professional and a learner and I will always cherish 
the memory of my time at Utrecht University. 

My sincere thanks to the Transatlantic research project Oceanic 
Exchanges (OcEx) of which I had the privilege to be part. In those years, I 
had the opportunity to work with lead historians and computer scientists; 
without OcEx, I would not be the scholar that I am today. 

Very special thanks to my family to which this book is dedicated for 
always listening, supporting me and encouraging me to aim high and never 
give up. Your love and care are my light through life; I love you with all my 
heart. And a very big thank you to my little nephew Emanuele, who thinks 
I must be very cold away from Sicily. Your drawings of us in the Sicilian 
sun have indeed warmed many cold, Luxembourgish winter days. 

I heartily thank Johanna Drucker for writing the Foreword to this book 
and more importantly for producing inspiring and important science. 

Thanks also to those who have supported this book project right from 
the start, including the anonymous reviewers who have provided insights 
and valuable comments, considerably contributing to improve my work. 

And finally, thanks to all those who, directly or indirectly, have been 
involved in this venture; in sometimes unpredictable ways, you have all 
contributed to make this book possible. 


CONTENTS 


1 The Humanities in the digital 1 
2 The Importance of Being Digital 37 
3 The Opposite of Unsupervised 57 
4 How Discrete 81 
5 What the Graph 107 
6 Conclusion 137 
References 147 


Index 169 


xix 


ABOUT THE AUTHOR 


Lorella Viola is research associate at the Luxembourg Centre for Con- 
temporary and Digital History (C?DH), University of Luxembourg. She 
is co-Principal Investigator in the Luxembourg National Research Fund 
project DHARPA (Digital History Advanced Research Project Accelera- 
tor). She was previously research associate at Utrecht University where 
she was Work Package Leader in the Transatlantic digital humanities 
research project Oceanic Exchanges. Currently, she is co-editing the vol- 
ume Multilingual Digital Humanities (Routledge) which brings together, 
advances and reflects on recent work on the social and cultural relevance 
of multilingualism for digital knowledge production. Her scholarship has 
appeared among others in Digital Scholarship in the Humanities, Frontiers 
in Artificial Intelligence, International Journal of Humanities and Arts 
Computing and Reviews in Digital Humanities. 


xxi 


AI 

API 
BoW 
CAGR 
CDH 
CHS 
C?2DH 
DDTM 
DeXTER 
DH 
DHA 
DHARPA 
GNM 
GPE 


ACRONYMS 


Artificial intelligence 

Application Programming Interface 
Bag of words 

Compound annual growth rate 
Critical Digital Humanities 

Critical Heritage Studies 

Centre for Contemporary and Digital History 
Discourse-driven topic modelling 
DeepteXTminER 

Digital humanities 
Discourse-historical approach 

Digital History Advanced Research Projects Accelerator 
GeoNewsMiner 

Geopolitical entity 

Graphical user interface 

Information retrieval 

Interactive topic modelling 

Los Angeles Review of Books 

Latent Dirichlet Allocation 

Least developed countries 
Landlocked developing countries 
Locality entity 

Machine learning 

Network analysis 

National Digital Newspaper Program 


xxiii 


xxiv ACRONYMS 


NEH 
NER 
NLP 
OcEx 
OCR 
ORG 
PER 
POS 

SA 
SIDS 
TEF-IDF 
UI 

UN 
UNESCO 


National Endowment for the Humanities 
Named entity recognition 

Natural language processing 

Oceanic Exchanges 

Optical character recognition 

Organisation entity 

Person entity 

Parts-of-speech tagging 

Sentiment analysis 

Small island developing states 

Term frequency-inverse document frequency 
User interface 

United Nations 

United Nations Educational, Scientific and Cultural Organi- 
zation 


Fig. 
Fig. 


Fig. 


Fig. 


Fig. 


Fig. 


Fig. 


Fig. 


3.1 


3.2 


3.3 


3.4 


3.5 


5.1 


5.2 


5.3 


LIST OF FIGURES 


Variation of removed material (in percentage) across 
issues/years of Cronaca Sovversiva 

Impact of pre-processing operations on ChroniclItaly 3.0 
per title. Figure taken from Viola and Fiscarelli (2021b) 
Variation of removed material (in 

percentage) across issues/years of 

L Italia 

Distribution of entities per title after intervention. Positive 
bars indicate a decreased number of entities after the 
process, whilst negative bars indicate an increased number. 
Figure taken from Viola and Fiscarelli (2021b) 
Logarithmic distribution of selected entities for SA across 
titles. Figure taken from Viola and Fiscarelli (2021b) 
Distribution of issues within ChroniclItaly 3.0 per title. Red 
lines indicate at least one issue in a three-month period. 
Figure taken from Viola and Fiscarelli (2021b) 

Wireframe of a post-authentic interface for topic 
modelling: sources upload. The wireframe displays how the 
post-authentic framework to metadata information could 
guide the development of an interface. Wireframe by the 
author and Mariella de Crouy Chanel 

Post-authentic framework to sources metadata information 
display. Interactive visualisation available at https:// 
observablehq.com/@dharpa-project/timestamped-corpus. 
Visualisation by the author and Mariella de Crouy Chanel 


63 


64 


65 


68 


75 


115 


116 


117 


XXV 


xxvi 


Fig. 


Fig. 


Fig. 


Fig. 


Fig. 
Fig. 
Fig. 


Fig. 


Fig. 


Fig. 


LIST OF FIGURES 


5.4 


5.5 


5.6 


5.7 


5.13 


Post-authentic interface for topic modelling: data preview. 
The wireframe displays how the post-authentic framework 
could guide the development of an interface for exploring 
the sources. Wireframe by the author and Mariella de Crouy 
Chanel 

Interface for topic modelling: data pre-processing. The 
wireframe displays how the post-authentic framework to 
UI could make pre-processing more transparent to users. 
Wireframe by the author and Mariella de Crouy Chanel 
Interface for topic modelling: data pre-processing 
(stemming and lemmatising). The wireframe displays how 
the post-authentic framework to UI could make stemming 
and lemmatising more transparent to users. Wireframe by 
the author and Mariella de Crouy Chanel 

Interface for topic modelling: corpus preparation. The 
wireframe displays how the post-authentic framework to UI 
could make corpus preparation more transparent to users. 
Wireframe by the author and Mariella de Crouy Chanel 
DeXTER default landing interface for NA. The red oval 
highlights the time bar (historicise feature) 

DeXTER default landing interface for NA. The red oval 
highlights the different title parameters 

DeXTER default landing interface for NA. The red ovals 
highlight the frequency and sentiment polarity parameters 
DeXTER: egocentric network for the node sicilia across all 
titles in the collection in sentences with prevailing positive 
sentiment 

DeXTER: network for the ego sicilia and alters across 
titles in the collection in sentences with prevailing positive 
sentiment 

DeXTER: default issue-focused network graph 


119 


120 


121 


122 


127 


128 


129 


131 


132 
133 


Check for 
updates 


CHAPTER 1 


The Humanities zm the digital 


The ultimate, hidden truth of the world is that it is something that we make, 
and could just as easily make differently. (Graeber 2013) 


1.1 IN THE DIGITAL 


The digital transformation of society was saluted as the imperative, unstop- 
pable revolution which would have provided unparalleled opportunities to 
our increasingly globalised societies. Among other benefits, it was praised 
for being able to accelerate innovation and economic growth, increase 
flexibility and productivity, reduce waste consumption, simplify and facil- 
itate services and information provision and improve competitiveness by 
drastically reducing development time and cost (Komar¢evié et al. 2017). 
At the same time, however, warnings about the dramatic and disruptive 
changes and outcomes that it would inevitably carry accompanied the 
considerable hype. For example, several economists raised serious concerns 
about the major risks that would derive from the digital transformation of 
society. A non-negligible number of evidence-based studies projected rise 
in social inequality, job loss and job insecurity, wage deflation, increased 
polarisation in society, issues of environmental sustainability, local and 
global threats to security and privacy, decrease in trust, ethical questions 
on the use of data by organisations and governments and online profiling, 
outdated regulations, issues of accountability in relation to algorithmic 
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governance, erosion of the social security and intensification of isolation, 
anxiety, stress and exhaustion (e.g., Autor et al. 2003; Cook and Van Horn 
2011; Hannak et al. 2014; Lacy and Rutqvist 2015; Weinelt 2016; Frey 
and Osborne 2017; Komarčević et al. 2017; Schwab 2017). 

Despite all the evidence, however, the extraordinary collective advan- 
tages presented by the new technologies were believed to far outweigh 
the risks (Weinelt 2016; Komarcéevié et al. 2017; Schwab 2017). Indeed, 
the prevailing tendency was to describe these great dangers rather as 
‘challenges’ which, however significant, were believed to be within govern- 
ments’ reach. The digital transformation of society would have undoubt- 
edly provided unprecedented ‘opportunities’ to collaborate across geogra- 
phies, sectors and disciplines, so naturally, on the whole, the highly praised 
positives of the digital revolution overshadowed the negatives. Some 
experts comment that this is in fact hardly surprising as in order for a 
revolution to be accomplished, the necessary support must be mobilised 
by governments, universities, research institutions, citizens and businesses 
(Komarc¢evié et al. 2017). 

Thus, in the last decade, though with differences across countries, 
both the public and the private sector have embraced the digital trans- 
formation (European Center for Digital Competitiveness 2021). Govern- 
ments around the world have increasingly implemented comprehensive 
technology-driven programmes and legal frameworks aimed at boosting 
innovation and entrepreneurship, whilst the industrial sector as a whole 
has invested massively in digitising business processes, work organisation 
and culture, modalities of market access, models of management and 
relationships with customers (ibid.). The digital transformation has then 
over the years forced businesses and governments to revolutionise their 
infrastructures to incorporate an effective and comprehensive digital strat- 
egy. Indeed, like always in history, the choice between adopting the new 
technology or not has quickly become rather between innovation and 
extinction. 

The digital transformation has profoundly affected research as well. The 
incorporation of technology in scholarship practice and culture, the imple- 
mentation of data-driven approaches and the size and complexity of usable 
and used data have increased exponentially in natural, computational, social 
science and humanities research. The ‘Digital Turn’, as it is called, has 
almost forced scholars to integrate advanced quantitative methods in their 
research, and in the humanities at large, it has, for example, led to the 
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emergence of completely new fields such as of course digital humanities 
(DH) (Viola and Verheul 2020b). 

Institutionally, universities have in contrast been slow to adapt. 
Although bringing the digital to education and research has been on 
higher education institutions’ agendas for years, the changes have always 
been set to be implemented gradually over the span of several years. 
Universities have in other words adopted an evolutionary approach to the 
digital (Alenezi 2021), according to which digital benefits are incorporated 
within an existing model of knowledge creation. This means that, on the 
one hand, the integration of the digital into knowledge creation practices 
and the combination of methods and perspectives from different disciplines 
are highly encouraged and much praised as the most effective way to 
accelerate and expand knowledge. At the same time, however, technology 
and the digital are seen as entities somewhat separate or indeed separable 
from knowledge creation itself. This moderate approach allows a gradual 
pace of change, and it is generally praised for its capacity to minimise 
disruptions while at the same time allowing change (Komarčević et al. 
2017; Microsoft Partner Community 2018). 

The reasons why universities have traditionally chosen this strategy 
are various and complex, but generally speaking they all have something 
in common. In his book Learning Reimagined, Graham Brown- Martin 
(2014) argues that the current model of education is still the same as the 
one that was set to prepare the industrial workforce of the nineteenth- 
century factories. This model was designed to create workers who would 
do their job silently all day to produce identical products; collaboration, 
creativity and critical thinking were precisely what the model aimed to 
discourage. As this system has become less and less relevant over the years, 
it has become increasingly costly to replace the existing infrastructures, 
including to radically rethink teaching and learning practices and to re- 
devise a new model of knowledge creation that would suit the higher 
education’s mission while at the same time respond to the needs of the 
new digital information and knowledge landscape. Therefore, for higher 
education institutions, the preferred strategy has traditionally been to 
progressively integrate digital tools in their existing systems, as a means to 
advance educational practices whilst containing the exorbitant costs that a 
true revolution would entail, including the inevitable disruptive changes. 
After all, despite what the word ‘revolution’ may suggest, these complex 
and radical processes are painfully slow and always require years to be 
implemented. In fact, as the ‘Gartner Hype Cycle’ of technology! indicates 
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(Fenn and Raskino 2008), only some of these processes are actually 
expected to eventually reach the virtual status of ‘Plateau of Productivity’ 
and if there is a cost to adapting slowly, the cost to being wrong is higher. 
The 2020 health crisis changed all this. In just a few months’ time, the 
COVID-19 pandemic accelerated years of change in the functioning of 
society, including the way companies in all sectors operated. In 2020, the 
McKinsey Global Institute surveyed 800 executives from a wide variety of 
sectors based in the United States, Australia, Canada, China, France, Ger- 
many, India, Spain and the United Kingdom (Sua et al. 2020). The report 
showed that since the start of the pandemic, companies had accelerated 
the digitisation of both their internal and external operations by three to 
four years, while the share of digital or digitally enabled products in their 
portfolios had advanced by seven years. Crucially, the study also provided 
insights into the long-term effects of such changes: companies claimed 
that they were now investing in their long-term digital transformations 
more than in anything else. According to a BDO’s report on the digital 
transformation brought about by the COVID crisis (Cohron et al. 2020, 
2), just as much as businesses that had developed and implemented digital 
strategies prior to the pandemic were in a position to leapfrog their 
less digital competitors, organisations that would not adapt their digital 
capabilities for the post-coronavirus future would simply be surpassed. 
Higher education has also been deeply affected. Before the COVID-19 
crisis, higher education institutions would look at technology’s strategic 
importance not as a critical component of their success but more as one 
piece of the pedagogical puzzle, useful both to achieve greater access 
and as a source of cost efficiency. For example, many academics had 
never designed or delivered a course online, carried out online students’ 
supervisions, served as online examiners and presented or attended an 
online conference, let alone organise one. According to the United Nations 
Educational, Scientific and Cultural Organization (UNESCO), at the first 
peak of the crisis in April 2020, more than 1.6 billion students around 
the world were affected by campus closures (UNESCO 2020). As on- 
campus learning was no longer possible, demands for online courses 
saw an unprecedented rise. Coursera, for example, experienced a 543% 
increase in new courses enrolments between mid-March and mid-May 
2020 alone (DeVaney et al. 2020). Having to adapt quickly to the virtual 
switch—much more quickly than they had considered feasible before the 
outbreak—universities and higher education institutions were forced to 
implement some kind of temporary digital solutions to meet the demands 
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of students, academics, researchers and support staff. In the peak of the 
pandemic, classes needed to be moved online practically overnight, and so 
did all sorts of academic interactions that would typically occur face-to-face: 
supervisions, meetings, seminars, workshops and conferences, to name but 
a few. Universities and research institutes didn’t have much choice other 
than to respond rapidly. Thus, just like in the business sector, the shift 
towards digital channels had to happen fast as those institutions that did 
not promptly and successfully achieve the transition towards the digital 
were in high risk of reducing their competitiveness dramatically, and not 
just in the near-term. 

The sudden accelerated digital shift by universities is one aspect of 
society’s forced digital switch during 2020. Remote work, omnichan- 
nel commerce, digital content consumption, platformification and digital 
health solutions are also examples of how society was kept afloat by the 
migration to the digital during the pandemic. This is not the kind of process 
that can be fully reversed. On the contrary, the most significant changes 
such as remote working, online offerings and remote interactions are in fact 
the most likely to remain in the long term, at least in some hybrid form. 
According to the McKinsey Global Institute survey (op. cit.), because such 
changes reflect new health and hygiene sensitivities, respondents were more 
than twice as likely to believe that there won’t be a full return to pre-crisis 
norms at all. Similarly, higher education predictions concerning digital or 
digitally enhanced offerings anticipated that these were likely to stay even 
after the health crisis would be resolved. Dynamic and blended approaches 
are therefore likely to become the ‘new normal’ as they allow universities to 
minimise potential teaching and learning disruptions in case of emergency 
and more importantly, they can now be implemented at a moment’s notice. 
Consequently, instructors are more and more required to reimagine their 
courses for an online format. The same goes for all the other aspects 
of a scholar’s life such as conference presentations, seminars, workshops, 
supervisions and exams, as well as research-specific tasks, including data 
gathering and analysis. 

COVID-19 has finally also changed the role of technology particularly 
with regard to its crucial function in universities’ risk mitigation strategies. 
According to the 2020 Coursera guide for universities to build and scale 
online learning programmes, universities that today are investing heavily 
in their digital infrastructures will be able to seamlessly pivot through any 
crisis in the future (DeVaney et al. 2020, 1). Although the digitisation of 
society was already underway before the crisis, it is argued in these reports 
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that the COVID-19 pandemic has marked a clear turning point of historic 
proportions for technology adoption for which the paradigm shift towards 
digitisation has been sharply accelerated. 

Yet if during the health crisis companies and universities were forced 
to adopt similar digitisation strategies, almost three years after the start 
of the pandemic, now things between the two sectors look different 
again. To succeed and adapt to the demands of the new digital market, 
companies understood that in addition to investing massively in their 
digital infrastructures, they crucially also had to create new business models 
that replaced the existing ones which had simply become inadequate 
to respond to the rules dictated by new generations of customers and 
technologies. The digital transformation has therefore required a deeper 
transformation in the way businesses were structuring their organisations, 
thought of the market challenges and approached problem-solving (Morze 
and Strutynska 2021). In contrast, it appears that higher education has 
returned to look at technology as a means for incremental changes, once 
again as a way to enhance learning approaches or for cost reduction 
purposes, but its disruptive and truly revolutionary impact continues to be 
poorly understood and on the whole under-theorised (Branch et al. 2020; 
Alenezi 2021). For instance, although universities and research institutes 
have to various degrees digitised pedagogical approaches, added digital 
skills to their curricula and favoured the use and development of digital 
methods and tools for research and teaching, technology is still treated 
as something contextual, something that happens alongside knowledge 
creation. 

Knowledge creation, however, happens i society. And while society 
has been radically transformed by technology which has in turn trans- 
formed culture and the way it creates it, universities continue to adopt 
an evolutionary approach to the digital (Alenezi 2021): more or less 
gradual adjustments are made to incorporate it but the existing model of 
knowledge creation is left essentially intact. The argument that I advance 
in this book is on the contrary that digitisation has involved a much greater 
change, a more fundamental shift for knowledge creation than the current 
model of knowledge production accommodates. This shift, I claim, has 
in fact been im—as opposed to towards—the digital. As societies are in 
the digital, one profound consequence of this shift is that research and 
knowledge are also in turn inevitably mediated by the digital to various 
degrees. As a bare minimum, for example, regardless of the discipline, 
a post-COVID researcher is someone able to embrace a broad set of 
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digital tools effectively. Yet what this entails in terms of how knowledge 
production is now accordingly lived, reimagined, conceptualised, managed 
and shared has not yet been adequately explored, let alone formally 
addressed. In relation to knowledge creation, what I therefore argue for 
is a revolutionary rather than an evolutionary approach to the digital. 
Whereas an evolutionary approach to the digital extends the existing 
model of knowledge creation to incorporate the digital in some form of 
supporting role, a revolutionary approach calls for a different model which 
entirely reconceptualises the digital and how it affects the very practices 
of knowledge production. Indeed, claiming that the shift has been 7” the 
digital acknowledges conclusively that the digital is now integral to not 
only society and its functioning, but crucially also to how society produces 
knowledge and culture. 

Crucially, such different model of knowledge production must break 
with the obsolescence of persisting binary modulations in relation to the 
digital—for example between digital knowledge creation and non-digital 
knowledge creation—in that they continue to suggest artificial divisions. It 
is the argument of this book that dual notions of this kind are the spectre of 
a much deeper fracture, that which divides knowledge into disciplines and 
disciplines into two areas: the sciences and the humanities. Significantly, a 
consequence of the shift in the digital is that reality has been complexified 
rather than simplified. Many of the multiple levels of complexity that 
the digital brings to reality are so convoluted and unpredictable that 
the traditional model of knowledge creation based on single discipline 
perspectives and divisions is not only unhelpful and conceptually limiting, 
but especially after the exponential digital acceleration brought about by 
the 2020 COVID-19 pandemic, also incompatible with the current reality 
and no longer suited to understand and explain the ramifications of this 
unpredictability. 

In arguing against a compartmentalisation of knowledge which essen- 
tially disconnects rather than connecting expertise (Stehr and Weingart 
2000), I maintain that the insistent rigid conceptualisation of division 
and competition is complicit of having promoted a narrative which has 
paired computational methods with exactness and neutrality, rigour and 
authoritativeness whilst stigmatising consciousness and criticality as carriers 
of biases, unreliability and inequality. The book is therefore primarily a 
reflection on the separation of knowledge into disciplines and of disciplines 
into the sciences vs the humanities and discusses its contemporary relevance 
and adequateness in relation to the ubiquitous impact of digital technolo- 
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gies on society and culture. In the pages that follow, I analyse many of 
the different ways in which reality has been transformed by technology— 
the pervasive adoption of big data, the fetishisation of algorithms and 
automation, the digitisation of education and research and the illusory, 
yet believed, promise of objectivism—and I argue that the full digitisation 
of society, already well on its way before the COVID-19 pandemic but 
certainly brought to its non-reversible turning point by the 2020 health 
crisis, has added even further complexity to reality, exacerbating existing 
fractures and disparities and posing new complex questions that urgently 
require a re-theorisation of the current model of knowledge creation in 
order to be tackled. 

In advocating for a new model of knowledge production, the book 
firmly opposes notions of divisions, particularly a division of knowledge 
into monolithic disciplines. I contend that the recent events have brought 
into sharper focus how understanding knowledge in terms of discipline 
compartmentalisation is anachronistic and not equipped to encapsulate 
and explain society. The pandemic has ultimately called for a reconcep- 
tualisation of knowledge creation and practices which now must operate 
beyond outdated models of separation. In moving beyond the current rigid 
framework within which knowledge production still operates, I introduce 
different concepts and definitions in reference to the digital, digital objects 
and practices of knowledge production in the digital, which break with 
dialectical principles of dualism and antagonism, including dichotomous 
notions of digital vs non-digital positions. 

This book focuses on the humanities, the area of academic knowledge 
that had already undergone radical transformation by the digital in the 
last two decades. I start by retracing schisms in the field between the 
humanities, the digital humanities (DH) and critical digital humanities 
(CDH); these are embedded, I argue, within the old dichotomy of 
sciences vs humanities and the persistent physics envy in our society and 
by extension, in research and academic knowledge. I especially challenge 
existing notions such as that of ‘mainstream humanities’ that characterise 
it as a field that is seemingly non-digital but critical. I maintain that in the 
current landscape, conceptualisations of this kind have more the colour ofa 
nostalgic invocation of a reality that no longer exists, perhaps as an attempt 
to reconstruct the core identity of a pre-digital scholar who now more than 
ever feels directly threatened by an aggressive other: the digital. Equally 
not relevant nor useful, I argue, is a further division of the humanities into 
DH and CDH. In pursuing this argumentation, I examine how, on the one 
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hand, scholars arguing in favour of CDH claim that the distinction between 
digital and analogue is pointless; therefore, humanists must embrace the 
digital critically; on the other hand, by creating a new field, i.e., CDH, 
they fall into the trap of factually perpetuating the very separation between 
digital and critical that they define as no longer relevant. 

In pursuing my case for a novel model of knowledge creation in the 
digital, throughout the book, I analyse personal use cases; specifically, I 
examine how I have addressed in my own work issues in digital practice 
such as transparency, documentation and reproducibility, questions about 
reliability, authenticity and biases, and engaging with sources through 
technology. Across the various examples presented in the following chap- 
ters, this book demonstrates how a re-examination of digital knowledge 
creation can no longer be achieved from a distance, but only from the 
inside, that the digital is no longer contextual to knowledge creation but 
that knowledge is created in the digital. This auto-ethnographic and self- 
reflexive approach allows me to show how my practice as a humanist 77 the 
digital has evolved over time and through the development of different 
digital projects. My intention is not to simply confront algorithms as 
instruments of automation but to unpack ‘the cultural forms emerging in 
their shadows’ (Gillespie 2014, 168). Expanding on critical posthumanities 
theories (Braidotti 2017; Braidotti and Fuller 2019), to this aim I then 
develop a new framework for digital knowledge creation practices—the 
post-authentic framework (cfr. Chap. 2)—which critiques current positivis- 
tic and deterministic views and offers new concepts and methods to be 
applied to digital objects and to knowledge creation im the digital. 

A little less than a decade ago, Berry and Dieter (2015) claimed that 
we were rapidly entering a world in which it was increasingly difficult 
to find culture outside digital media. The major premise of this book is 
that especially after COVID-19, all information is now digital and even 
more, algorithms have become central nodes of knowledge and culture 
production with an increased capacity to shape society at large. I therefore 
maintain that universities and higher education institutions can no longer 
afford to consider the digital has something that is happening to knowledge 
creation. It is time to recognise that knowledge creation is happening in 
the digital. As digital vs non-digital positions have entirely lost relevance, 
we must recognise that the current model of knowledge grounded in 
rigid divisions is at best irrelevant and unhelpful and at worst artificial and 
harmful. Scholars, researchers, universities and institutions have therefore 
a central role to play in assessing how digital knowledge is created not 
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just today, but also for the purpose of future generations, and clear 
responsibilities to shoulder, those that come from being im the digital. 


1.2 THE ALGORITHM MADE ME Do IT! 


Computational technology such as artificial intelligence (AI) can be 
thought in many ways to be like a ‘Mechanical Turk’. The Mechanical 
Turk or simply ‘The Turk’ was a chess-playing machine constructed by 
Wolfgang von Kempelen in the late eighteenth century. The mechanism 
appeared to be able to play a game of chess against a human opponent 
completely by itself. The Turk was brought to various exhibitions and 
demonstrations around Europe and the Americas for over eighty years and 
won most of the games played, defeating opponents such as Napoleon 
Bonaparte and Benjamin Franklin. In reality, the Mechanical Turk was a 
complex, mechanical illusion that was in fact operated by a human chess 
master hiding inside the machine. 

AI and technology can be thought in many ways to be like the 
Mechanical Turk whereby the choices and actions hidden from view only 
but create the illusion of both a fully autonomous process and impartial 
output. And just like the Mechanical Turk was celebrated and paraded, 
the ‘Digital Turn’ and its flow of data have been applauded and welcomed 
practically ubiquitously. Indeed, hyped up by the reassuring promises of 
neutrality, objectivity, fairness and accuracy held out by digital technology 
and data, both industry and academia have embraced the so-called big 
data revolution, data-sets that are so large and complex that no traditional 
software—let alone humans—would ever be able to analyse it. In 2017, 
IBM reported that more than 90% of the world’s data had appeared in 
the two previous years alone. Today, in sectors such as healthcare, big 
data is being used to reduce healthcare costs for individuals, to improve 
the accuracy and the waiting time for diagnoses, to effectively avoid 
preventable diseases or to predict epidemic outbreaks. The market of big 
data analytics in healthcare has continually grown and not just since the 
COVID-19 pandemic. According to a 2020 report about big data in 
healthcare, the global big data healthcare analytics market was worth over 
$14.7 billion in 2018, $22.6 billion in 2019 and expected to be worth 
$67.82 billion by 2025. A more recent projection in June 2020 estimated 
this growth to reach $80.21 billion by 2026, exhibiting a CAGR? of 27.5% 
(ResearchAndMarkets.com 2020). 
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Big data analytics has also been incorporated into the banking sector 
for tasks such as improving the accuracy of risk models used by banks and 
financial institutions. In credit management, banks use big data to detect 
fraud signals or to understand the customer behaviour from the analysis 
of investment patterns, shopping trends, motivation to invest and personal 
or financial background. According to recent predictions, the market of 
big data analytics in banking could rise to $62.10 billion by 2025 (Flynn 
2020). Ever larger and more complex data-sets are also used for law and 
order policy (e.g., predictive policing), for mapping user behaviour (e.g., 
social media), for recording speech (e.g., Alexa, Google Assistant, Siri) and 
for collecting and measuring the individual’s physiological data, such as 
their heart rate, sleep patterns, blood pressure or skin conductance. And 
these are just a few examples. 

More data and therefore more accuracy and freedom from subjectivity 
were also promised to research. Disciplines across scientific domains have 
increasingly incorporated technology within their traditional workflows 
and developed advanced data-driven approaches to analyse ever larger and 
more complex data-sets. In the spirit of breaking the old schemes of opaque 
practices, it is the humanities, however, that has arguably been impacted 
the most by this explosion of data. Thanks to the endless flow of searchable 
material provided by the Digital Turn, now humanists could finally change 
the fully hermeneutical tradition, believed to perpetuate discrimination and 
biases. 

This looked like ‘that noble dream’ (Novick 1988). Millions of records 
of sources seemed to be just a click away. Any humanist scholar with a 
laptop and an Internet connection could potentially access them, explore 
them and analyse them. Even more revolutionising was the possibility to 
finally be able to draw conclusions from objective evidence and so dismiss 
all accusations that the humanities was a field of obscure, non-replicable 
methods. Through large quantities of ‘data’, humanists could now under- 
stand the past more wholly, draw more rigorous comparisons with the 
present and even predict the future. This ‘DH moment’, as it was called, 
was perfectly in line with a more global trend for which data was (and to a 
large extent still is) presumed to be accurate and unbiased, therefore more 
reliable and ultimately, fairer (Christin 2016). The ‘DH promise’ (Thomas 
2014; Moretti 2016) was a promise of freedom, freedom from subjectivity, 
from unreliability, but more importantly from the supposed irrelevance of 
the humanities in a data-driven world. It was also soaked in positivistic 
hypes about the endless opportunities of data-driven research methods 
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in general and for humanities research in particular, such as the artful 
deception of suddenly being able to access everything or the scientistic 
belief in data as being more reliable than sources. 

Following this positivistic hype, however, the unquestioning belief in 
the endless possibilities and benefits of applying computational techniques 
for the good of society and research started to be harshly criticised for being 
false and unrealistic (cfr. Sect. 1.3). The alluring and reassuring promises of 
data neutrality, objectivity, fairness and accuracy have indeed been found 
illusory, algorithms and data-driven methods even more biased than the 
interpretative act itself (Dobson 2019) and, ironically, in desperate need of 
human judgement to not cause harm (Gillespie 2014). 

Particularly the indiscriminate use of big data in domains of societal 
influence such as bureaucracy, policy-making or policing has started to raise 
fundamental questions about democracy, ethics and accountability. For 
example, data companies hired by politicians all over the world have used 
questionable methods to mine the social media profiles of voters to influ- 
ence election results through a technique called microtargeting that uses 
extremely targeted messages to influence users’ behaviour. Although it is 
true that this technique has proven highly effective for marketing purposes, 
the causality of political microtargeting remains largely under-researched 
and therefore it is still poorly understood. The fact remains, however, 
that the use of personal data collected without the user’s knowledge or 
permission to build sophisticated profiling models raises ethical and privacy 
issues. For example, in 2015, Cambridge Analytica acquired the personal 
data of about 87 million Facebook users without their explicit permission. 
Their data had been collected via the 270,000 Facebook users who had 
given the third-party app ‘This Is Your Digital Life’ access to information 
on their friends’ network. Cambridge Analytica had acquired and used such 
data claiming it was exclusively for academic purposes; Facebook had then 
allowed the app to harvest data from the Facebook friends of the app’s 
users which were subsequently used by Cambridge Analytica. In this way, 
although only 270,000 people had given permission to the app, data was 
in fact collected from 87 million users. This revealed a scary privacy and 
personal data management loophole in Facebook’s privacy agreement; it 
raised serious concerns about how digital private information is collected, 
stored and shared not just by Facebook but by companies in general and 
how these opaque processes often leave unaware individuals completely 
powerless. 
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But it is not just tech giants and academic research that jumped on the 
suspicious big data and AI bandwagon; governments around the world 
have also been exploiting this technology for matters of governance, law 
enforcement and surveillance, such as blacklisting and the so-called pre- 
dictive policing, a data-driven analytics method used by law enforcement 
departments to predict perpetrators, victims or locations of future crimes. 
Predictive policing software analyses large sets of historic and current crime 
data using machine learning (ML) algorithms to determine where and 
when to deploy police (i.e., place-based predictive policing) or to identify 
individuals who are allegedly more likely to commit or be a victim ofa crime 
(i.e., person-based predictive policing). While supporters of predictive 
policing argue that these systems help predict future crimes more accurately 
and objectively than police’s traditional methods, critics complain about 
the lack of transparency in how these systems actually work and are used 
and warn about the dangers of blindly trusting the supposed rigour of this 
technology. For example, in June 2020, Santa Cruz, California—one of 
the first US cities to pilot this technology in 2011—was also the first city 
in the United States to ban its municipal use. After nine years, the city of 
Santa Cruz decided to discontinue the programme over concerns of how 
it perpetuated racial inequality. The argument is that, as the data-sets used 
by these systems include only reported crimes, the obtained predictions are 
deeply flawed and biased and result in what could be seen as a self-fulfilling 
prophecy. In this respect, Matthew Guariglia maintains that ‘predictive 
policing is tailor-made to further victimize communities that are already 
overpoliced—namely, communities of colour, unhoused individuals, and 
immigrants—by using the cloak of scientific legitimacy and the supposed 
unbiased nature of data’ (Guariglia 2020). Despite other examples of 
predictive policing programmes being discontinued following audits and 
lawsuits, at the moment of writing, more than 150 cities in the United 
States have adopted predictive policing (Electronic Frontier Foundation 
2021). Outside of the United States, China, Denmark, Germany, India, 
the Netherlands and the United Kingdom are also reported to have tested 
or deployed predictive policing tools. 

The problem with predictive policing has little to do with intentionality 
and a lot to do with the limits of computation. Computer algorithms are a 
finite list of instructions designed to perform a computational task in order 
to produce a result, i.e., an output of some kind. Each task is therefore 
performed based on a series of instructed assumptions which, far from 
being unbiased, are not only obfuscated by the complexity of the algorithm 
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itself but also artfully hidden by the surrounding algorithmic discourse 
which socially legitimises its outputs as objective and reliable. The truth 
is, however, that computers are extremely efficient and fast at automating 
complex and lengthy processes but that they perform rather poorly when 
it comes to decision-making and judgement. In the words of Danah Boyd 
(2016, 231): 


[...] if they [computers] are fed a pile of data and asked to identify 
correlations in that data, they will return an answer dependent solely on 
the data they know and the mathematical definition of correlation that they 
are given. Computers do not know if the data they receive is wrong, biased, 
incomplete, or misleading. They do not know if the algorithm they are told to 
use has flaws. They simply produce the output they are designed to produce 
based on the inputs they are given. 


Boyd gives the example of a traffic violation: a red light run by someone 
who is drunk vs by someone who is experiencing a medical emergency. If 
the latter scenario is not embedded into the model as a specific exception, 
then the algorithm will categorise both events as the same traffic violation. 
The crucial difference in decision-making processes between humans and 
algorithms is that humans are able to make a judgement based on a 
combination of factors such as regulations, use cases, guidelines and, 
fundamentally, environmental and contextual factors, whereas algorithms 
still have a hard time mimicking the nature of human understanding. 
Human understanding is fluid and circular, whilst algorithms are linear 
and rigid. Furthermore, the data-sets on which computational decision- 
making models are based are inevitably biased, incomplete and far from 
being accurate because they stem from the very same unequal, racist, sexist 
and biased systems and procedures that the introduction of computational 
decision-making was intended to prevent in the first place. 

Moreover, systems become increasingly complex and what might be 
perceived as one algorithm may in fact be many. Indeed, some systems 
can reach a level of complexity so deep that understanding the intricacies 
and processes according to which the algorithms perform the assigned tasks 
becomes problematic at best, if at all possible (Gillespie 2014). Although 
this may not always have serious consequences, it is nevertheless worth of 
close scrutiny, especially because today complex ML algorithms are used 
extensively, and more and more in systems that operate fundamental social 
functions such as the already cited healthcare and law and order, but as a 
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matter of fact they are still ‘poorly understood and under-theorized’ (Boyd 
2018). Despite the fact that they are assumed to be, and often advertised 
as being neutral, fair and accurate, each algorithm within these complex 
systems is in fact built according to a set of assumptions and cultural values 
that reflect the strategic choices made by their creators according to specific 
logics, may these be corporate or institutional. 

Another largely distorted view surrounding digital and algorithmic 
discourse concerns data. Although algorithms and data are often thought 
to be two distinct entities independent from each other, they are in 
fact two sides of the same coin. In fact, to fully understand how an 
algorithm operates the way it does, one needs to look at it in combination 
with the data it uses, better yet at how the data must be prepared for 
the algorithm to function (Gillespie 2014). This is because in order for 
algorithms to work properly, that is automatically, information needs to be 
rendered into data, e.g., formalised according to categories that will define 
the database records. This act of categorising is precisely where human 
intervention hides. Gillespie pointedly remarks that far from being a neutral 
and unbiased operation, categorisation is in fact an act of ‘a powerful 
semantic and political intervention’ (Gillespie 2014, 171), deciding what 
the categories are, what belongs in a category and what does not are 
all powerful worldview assertions. Database design can therefore have 
potentially enormous sociological implications which to date have largely 
been overlooked (ibid.). 

A recent example of the larger repercussions of these powerful world- 
view assertions is fashion companies for people with disabilities and how 
their requests to be advertised by Facebook have been systematically 
rejected by Facebook’s automated advertising centre. Again, the reason 
for the rejection is unlikely to have anything to do with intentionally 
discriminating against people with disabilities, but it is to be found in 
the way fashion products for people with disabilities are identified (or 
rather misidentified) by Facebook algorithms that determine products’ 
compliance with Facebook policy. Specifically, these items were categorised 
as ‘medical and health care products and services including medical devices’ 
and as such, they violated Facebook’s commercial policy (Friedman 2021). 
Although these companies had their ads approved after appealing to Face- 
book’s decision, episodes like this one reveal not only the deep cracks in 
ML models, but worse, the strong biases in society at large. To paraphrase 
Kate Crawford, every classification system in machine learning contains 
a worldview (Crawford 2021). In this particular case, the implicit bias in 
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Facebook’s database worldview would be that a person with disability is not 
believed to possibly have an interest in fashion as a form of self-expression. 

Despite the growing evidence as well as statements of acknowledge- 
ment—‘Raw Data is an oxymoron’, Lisa Gitelman wrote in 2013 (Gitel- 
man 2013)—in most of the public and academic discourse, data continues 
to be exalted as being exact and unarguable, mostly still thought of as 
a natural resource rather than a cultural, situated one. To the contrary, 
it is the uncritical use of data to make predictions in matters of welfare, 
homelessness, crime and child protection to name but a few which has 
created systems that are, in Virginia Eubanks’ words, ‘Automating Inequal- 
ity’ (2017). The immediate, profound and dangerous consequence of the 
indiscriminate use of automated systems is that the resulting decisions 
are remorselessly blamed on the targeted individual and justified morally 
through the legitimisation of practices believed to be evidence-based, 
therefore accurate and unbiased. This is what Boyd calls ‘dislocation of 
liability’ (2016, 232) for which decision-makers are distanced from the 
humanity of those affected by automated procedures. 

In this book, I advance a critique of the mainstream big data and 
algorithmic discourse which continues to fetishisise data as impartial and 
somewhat pre-existing and which obscures the subjective and interpretative 
dimension of collecting, selecting, categorising and aggregating, i.e., the 
act of making data. I argue that following the shift im the digital rapidly 
accelerated by the pandemic, a new set of notions, practices and values 
needs to be devised in order to re-figure the way in which we conceptualise 
data, technology, digital objects and on the whole the process of digital 
knowledge creation. Drawing on posthumanist studies (Braidotti 2017; 
Braidotti and Fuller 2019; Braidotti 2019) and on recent theories of digital 
cultural heritage (Cameron 2021), to this end, I present a novel framework: 
the post-authentic framework. With this framework, I propose concepts, 
practices and values that recognise the larger cultural relevance of digital 
objects and the methods to create them, analyse them and visualise them. 
Significantly, the post-authentic framework problematises digital objects 
as unfinished, situated processes and acknowledges the limitations, biases 
and incompleteness of tools and methods adopted for their analysis in the 
process of digital knowledge creation. In this way, the framework ultimately 
introduces a counterbalancing narrative in the main positivist discourse that 
equals the removal of the human—which in any case is illusory—to the 
removal of biases. Indeed, as the promises of a newly found freedom from 
subjectivity are increasingly found to be false, the post-authentic framework 
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acts as a reminder that in our own time, computational technology is like 
the Mechanical Turk of that earlier century. 

Featuring a range of personal case studies and exploring a variety 
of applied contexts such as digital heritage practices, digital linguistic 
injustice, critical digital literacy and critical digital visualisation, I devote 
specific attention to four key aspects of knowledge creation in the digital: 
creation of digital material, enrichment of digital material, analysis of digital 
material and visualisation of digital material. My intention is to show 
how contributions to working towards systemic change in research and 
by extension in society at large, can be implemented when collecting, 
assessing, reviewing, enriching, analysing and visualising digital material. 
Throughout the chapters, I use the post-authentic framework to discuss 
these various case examples and to show that it is only through the 
conscious awareness of the delusional belief in the neutrality of data, tools, 
methods, algorithms, infrastructures and processes (i.e., by acknowledging 
the human chess master hiding inside the Turk) that the embedded biases 
can be identified and addressed. 

My argument is closely related to the notion of ‘originary technicity’ 
(see, for instance, Heidegger 1977; Clark 1992; Derrida 1994; 
Beardsworth 1996; Stiegler 1998) which rejects the Aristotelian view 
of technology as merely utilitarian. Originary technicity claims that 
technology is not simply a tool that humans deploy for their own ends, 
because humans are always invested in the technology they develop. In this 
way, technology (e.g., AI and algorithms) becomes in turn a central node 
of knowledge and culture production and the knowledge and culture 
so produced shape humans and their vision of the world in a mutually 
reinforcing cycle. Culture is incorporated in technology as it is built by 
humans who then use technology to produce culture. Hence, as the 
very concept of an absolute objectivity when adopting computational 
techniques (or in general, for that matter) is an illusion, so are the notions 
of ‘fully autonomous’ or ‘completely unbiased’ processes. An uncritical 
approach to the use of computational methods, I maintain, not only 
simply reinforces the very old schemes of obscure practices that digital 
technology claims to break, but more importantly it can make society 
worse. 

This is a reality that can no longer be ignored and which can only 
be confronted through a reconfiguration of our model of knowledge 
creation. This re-examination would relinquish illusory positivistic notions 
and acknowledge digital processes as situated and partial, as an extremely 
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convoluted assemblage of components which are themselves part of wider 
networks of other entities, processes and mechanisms of interaction. 
Broadly, the argument that I advance is that the current model of 
knowledge must be re-figured to incorporate this critical awareness, ever 
more necessary in order to address the new challenges brought by the 
pandemic and the digital transformation of society. The shift 77 the digital 
has created a complexity that a model of knowledge supporting divisive 
positions (i.e., between on one side disciplines that are digital and therefore 
believed to be objective and on the other disciplines that are non-digital 
and therefore biased) cannot address. 

I start my argument for an urgent knowledge reconceptualisation by 
building upon posthuman critical theory (Braidotti 2017) which argues 
that the matter ‘is not organized in terms of dualistic mind/body oppo- 
sitions, but rather as materially embedded and embodied subjects-in- 
process’ (16). In this regard, posthuman critical theory introduces the 
helpful notion of monism (cfr. Chap. 2), in which the power of differences 
is not denied but at the same time, it is not structured according to 
principles of oppositions, and therefore it does not function hierarchically 
(ibid.). A model of knowledge im the digital equally abandons dichotomous 
ideas that continue to be at the foundation of our conceptualisation of 
knowledge formation, such as digital vs non-digital positions, critical vs 
technological and, the biggest of all, that of the sciences vs the humanities. 


1.3 A TALE OF TWO CULTURES 


The hyper-specialisation of research that a discipline-based model of knowl- 
edge creation inevitably entails and how such a solid structure impedes 
rather than advancing knowledge has been debated in the academic forum 
for years (e.g., Klein 1983; Thompson Klein 2004; Chubin et al. 1986; 
Stehr and Weingart 2000; McCarty 2015). As the rigid organisation 
into disciplines has begun to dissolve over the course of the twenty-first 
century, observers started to suggest that the existing model of knowledge 
production was increasingly inadequate to explain the world and that it 
was in fact modern society itself that was calling for its reconceptualisation. 
Weingart and Stehr (2000), for instance, proposed that ‘one may have to 
add a postdisciplinary stage to the predisciplinary stage of the seventeenth 
and eighteenth centuries and the disciplinary stage of the nineteenth and 
twentieth centuries’ (ibid., xii). At the same time, however, the undeniable 
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amalgamation of disciplines was affecting areas of knowledge unevenly; 
authors noticed how, for example, in fields such as the natural sciences with 
a problem-solving orientation and where knowledge production is typically 
fast, boundaries between disciplines were much more blurred than in the 
humanities (ibid.). 

The Digital Turn seemed to be capable of changing this tradition. The 
dynamic and disrupting essence of the digital on knowledge creation and 
on humanities scholarship in particular appeared to be correcting this 
unevenness and make the humanities interdisciplinary. Scholars observed 
how the digital was not only challenging and transforming structures 
of knowledge but that it was also creating new structures (e.g., digital 
humanities, digital history, digital cultural heritage) (Klein 2015; Cameron 
and Kenderdine 2007; Cameron 2007). The field of DH, it was argued, 
would in this sense be ‘naturally’ interdisciplinary as it provides new 
methods and approaches which necessarily require new practices and new 
ways of collaborating. Another ‘promise’ of DH was that of being able 
to ‘transform the core of the academy by refiguring the labor needed for 
institutional reformation’ (Klein 2015, 15). 

After the initial enthusiasm and despite many examples around the 
world of interdisciplinary initiatives, academic programmes, departments 
and centres (Stehr and Weingart 2000; Deegan and McCarty 2011; Klein 
2015), in twenty years, the rigid division into disciplines has however not 
changed much; it remains the persistent dominant model in use for knowl- 
edge production, and true collaboration is on the whole rare (Deegan and 
McCarty 2011, 2). Indeed, what these cases of interdisciplinarity show 
is a common trend: when disciplines share similar interests, rather than 
boundaries dissolving and merging as interdisciplinary discourse usually 
claims, what in fact tends to happen is that in order to respond to the new 
external challenges, disciplines further specialise and by leveraging their 
overlapping spaces, they create yet new fields. This modern phenomenon 
has been referred to as ‘The paradox of interdisciplinarity’ (Weingart 
2000): 


interdisciplinarity [...] is proclaimed, demanded, hailed, and written into 
funding programs, but at the same time specialization in science goes on 
unhampered, reflected in the continuous complaint about it. [...] The 
prevailing strategy is to look for niches in uncharted territory, to avoid 
contradicting knowledge by insisting on disciplinary competence and its 
boundaries, to denounce knowledge that does not fall into this realm as 
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‘undisciplined.’ Thus, in the process of research, new and ever finer structures 
are constantly created as a result of this behaviour. This is (exceptions 
notwithstanding) the very essence of the innovation process, but it takes 
place primarily within disciplines, and it is judged by disciplinary criteria of 
validation. (Weingart 2000, 26-27) 


The author argues that starting from the early nineteenth century when 
the separation and specialisation of science into different disciplines was 
created, interdisciplinarity became a promise, the promise of the unity of 
science which in the future would have been actualised by reducing the 
fragmentation into disciplines. Today, however, interdisciplinarity seems to 
have lost interest in that promise as the discourse has shifted from the idea 
of ultimate unity to that of innovation through a combination of variations 
(ibid., 41). For example, in his essay Becoming Interdisciplinary, McCarty 
(2015) draws a close parallel between the struggle of dealing with the post- 
World War II overwhelming amount of available research that inspired 
Vannevar Bush’s Memex and the situation of contemporary researchers. 
Bush (1945) maintained that the investigator could not find time to deal 
with the increasing amount of research which had exceeded far beyond 
anyone’s ability to make real use of the record. The difficulty was, in his 
view, that ifon the one hand ‘specialization becomes increasingly necessary 
for progress’, on the other hand, ‘the effort to bridge between disciplines 
is correspondingly superficial.’ The keyword on which we should focus our 
attention, McCarty argues, is superficial (2015, 73): 


Bush’s geometrical metaphor (superficies, having length or breadth with- 
out thickness), though undoubtedly intended as merely a common adjec- 
tive, makes the point elaborated in another context by Richard Rorty 
(2004/2002): that the implicit model of knowledge at work here privileges 
singular truth at depth, reached by the increasingly narrower focus of 
disciplinary specialization, and correspondingly trivializes plenitude on the 
surface, and so the bridging of disciplines. 


According to Rorty, being interdisciplinary does not mean looking for 
the one answer but going superficial, i.e., wide, to collect multiple voices 
and multiple perspectives (2004). It has been argued, however, that true 
collaboration requires a more fundamental shift in the way knowledge 
creation is conceived than simply studying a common question or problem 
from different perspectives (van den Besselaar and Heimeriks 2001; Dee- 
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gan and McCarty 2011). This would also include a deep understanding 
of disciplines and approaches other than one’s own (Gooding 2020). 
Indeed, the contemporary notion of interdisciplinarity based on the idea 
that innovation is better achieved by recombining ‘bits of knowledge 
from previously different fields’ into novel fields is bound to create more 
specialisation and therefore new boundaries (Weingart 2000, 40). 

The schism of the humanities between ‘mainstream humanities’ and 
digital humanities, and later between digital humanities and critical digital 
humanities, perfectly illustrates the issue. In 2012, Alan Liu wrote a 
provocative essay titled Where Is Cultural Criticism in the DH? (Liu 2012). 
The essay was essentially a plea for DH to embrace a wider engagement 
with the societal impact of technology. It was very much the author’s hope 
that the plea would help to convert this ‘deficit’ into ‘an opportunity’, 
the opportunity being for DH to gain a long overdue full leadership, 
as opposed to a ‘servant’ role within the humanities. In other words, if 
the DH wanted to finally become recognised as legitimate partners of 
‘mainstream humanities’, they needed to incorporate cultural criticism in 
their practices and stop pushing buttons without reflecting on the power 
of technology. 

In the aftermath of Liu’s essay, reactions varied greatly with views 
ranging from even harsher accusations towards DH to more optimistic 
perspectives, and some also offering fully programmatic and epistemolog- 
ical reflections. Some scholars, for example, voiced strong concerns about 
the wider ramifications of the lack of cultural critique in DH, what has 
often been referred to as ‘the dark side of the digital humanities’ (Grusin 
2014; Chun et al. 2016), the association of DH with the ‘corporatist 
restructuring of the humanities’ (Weiskott 2017), neoliberalism (Allington 
et al. 2016), and white, middle-class, male dominance (Bianco 2012). Two 
controversial essays in particular, one published in 2016 by Allington et 
al. (op. cit.) and the other a year later by Brennan (2017) argued that, 
in a little over a decade, the myopic focus of DH on neoliberal tooling 
and distant reading had accomplished nothing but consistently pushing 
aside what has always been the primary locus of humanities investigation: 
intellectual practice. 

This view was also echoed by Grimshaw (2018) who indicted DH 
for going to bed with digital capitalism, ‘an online culture that is anti- 
diversity and enriching a tiny group of predominantly young white men’ 
(2). Unlike Weiskott (2017), however, who argued ‘There is no such 
a thing as “the digital humanities”’, meaning that DH is merely an 
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opportunistic investment and a marketing ploy but it doesn’t really alter the 
core of the humanities, Grimshaw maintained that this kind of pandering 
causes rot at the heart of the humanistic knowledge and practice. This 
he calls ‘functionalist DH’, the use of tools to produce information in 
line with managerial metrics but with no significant knowledge value (6). 
Grimshaw strongly criticises DH for having disappointed the promise of 
being a new discipline of emancipation and for being in fact ‘nothing 
more than a tool for oppression’. The digital transformation of society, 
he continues, has resulted in increased inequality, wider economic gap, 
an upsurge in monopolies and surveillance, lack of transparency of big 
data, mobbing, trolling, online hate speech and misogyny. Rather than 
resisting it, DH is guilty of having embraced such culture, of operating 
within the framework of lucrative tech deals which perpetuate and reinforce 
the neoliberal establishment. Digital humanists are establishment curators 
and no longer able of critical thought; DH is therefore totally unequipped 
to rethink and criticise digital capitalism. Although he acknowledges the 
emergence of critical voices within DH, he also strongly advocates a more 
radical approach which would then justify the need for a ‘new’ field, 
an additional space within the university where critique, opposition and 
resistance can happen (7). This space of resistance and critical engagement 
with digital capitalism is, he proposes, critical digital humanities (CDH). 

Over the years, other authors such as Hitchcock (2013), Berry (2014) 
and Dobson (2019) have also advocated critical engagement with the 
digital as the epistemological imperative for digital humanists and have 
identified CDH as the proper locus for such engagement to take place. For 
example, according to Hitchcock, humanists that use digital technology 
must ‘confront the digital’, meaning that they must reflect on the contex- 
tual theoretical and philosophical aspects of the digital. For Berry, CDH 
practice would allow digital humanists to explore the relationship between 
critical theory and the digital and it would be both research- and practice- 
led. Equally for Dobson, digital humanists must endlessly question the 
cultural dimension and historical determination of the technical processes 
behind digital operations and tools. With perhaps the sole exception of 
Grimshaw (op. cit.) who is not interested in practice-led digital enquiry, 
the general consensus is on the urgency of conducting critically engaged 
digital work, that is, drawing from the very essence of the humanities, its 
intrinsic capacity to critique. 

However, whilst these proposed methodologies do not differ dramati- 
cally across authors, there seems to be disagreement about the scope of the 
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enquiry itself. In other words, the open question around CDH would not 
concern so much the how (nor the why) but the what for?. For example, 
Dobson (2019) is not interested in a critical engagement with the digital 
that aims to validate results; this would be a pointless exercise as the 
distinction between the subjectivity of an interpretative method and the 
objectivity of both data and computational methods is illusory. He claims 


(ibid., 46): 


there is no such thing as contextless quantitative data. [...] Data are imag- 
ined, collected, and then typically segmented. [...] We should doubt any 
attempt to claim objectivity based on the notion of bypassed subjectivity 
because human subjectivity lurks within all data. This is because data do 
not merely exist in the world, but are abstractions imagined and generated 
by humans. Not only that, but there always remain some criteria informing 
the selection of any quantity of data. This act of selection, the drawing of 
boundaries that names certain objects a data-set introduces the taint of the 
human and subjectivity into supposedly raw, untouched data. 


As ‘There is no such thing as the “unsupervised” (ibid., 45), the aim of 
CDH is to thoroughly critique any claimed objectivity of all computational 
tools and methods, to be suspicious of presumed human-free approaches 
and to acknowledge that complete de-subjectification is impossible. The 
aim of CDH, he argues, is not to expand the set of questions in DH, like 
in Berry and Fagerjord’s view (2017), but to challenge the very notion of 
a completely objective approach. In this sense, CDH is the endless search 
for a methodology, the very essence of humanistic enquiry. 

Berry (2014) also starts from the assumption that the notion of objective 
data is illusory, however, he reaches opposite conclusions about what the 
aim of CDH is. For him and Fagerjord (2017), CDH would provide 
researchers with a space to conduct technologically engaged work, that is, 
work that uses technology but also draws on a vast range of theoretical 
approaches (e.g., software studies, critical code studies, cultural/critical 
political economy, media and cultural studies). This would allow scholars 
from many critical disciplines to tackle issues such as the historical context 
of any used technology and its theoretical limitations, including, for 
instance, a commitment to its political dimension. By doing so, CDH 
would address the criticism about the lack of cultural critique in DH and 
it would enrich DH with other forms of scholarly work (ibid., 175). In 
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other words, by ‘fixing’ the lack of critical engagement of the field, the 
function of CDH would be to strengthen DH, thus markedly diverging 
from Dobson. 

Albeit from different epistemological points of view, these reflections 
share similar methodological and ethical concerns and question the lack of 
critical engagement of DH, be they historical, cultural or political. I argue 
however that this reasoning exposes at least three inconsistencies. Firstly, 
in earlier perspectives (e.g., Liu 2012), the sciences are deemed to be 
obviously superior to the humanities and yet, as soon as the computational 
is incorporated into the field, the value of the humanities seems to have 
decreased rather than increased. For example, Bianco (2012) advocates 
a change in the way digital humanists ‘legitimise’ and ‘institutionalise’ 
the adoption of computational practices in the humanities. Such change 
would require not simply defending the legitimacy or advocating the 
‘obvious’ supremacy of computational practices but by reinvesting in the 
word humanities in DH. The supremacy of the digital would then be 
understood as a combination of superiority, dominance and relevance that 
computational practices—and by extension, the hard sciences (i.e., physics 
envy)—are believed to have over the humanities. However, as Grimshaw 
(2018) also argued later, in the process of incorporating the computational 
into their practices, the humanities forgot all about questions of power, 
domination, myth and exploitation and have become less and less like the 
humanities and more and more like a field of execute button pushers. 
Despite acknowledging the illusion of subjectivity, this view shows how 
deeply rooted in the collective unconscious is the myth surrounding 
technology and science which firmly positions them as detached from 
human agency and distinctly separated from the humanities. 

Secondly and following from the first point, these views all share a 
persistent dualistic, opposing notion of knowledge, which in one form or 
another, under the disguise of either freshly coined or well-seasoned terms, 
continue to reflect what Snow famously called ‘the two cultures’ of the 
humanities and the sciences (2013). Such separation is typically verbalised 
in competing concepts such as subjectivity vs objectivity, interpretative vs 
analytical and critical vs digital. Despite using terms that would suggest 
union (e.g., ‘incorporated’), the two cultures remain therefore clearly 
divided. The conceptualisation of knowledge creation which continues to 
compartmentalise fields and disciplines, I argue, is also reflected in the clear 
division between the humanities, DH and CDH. This model, I contend, is 
highly problematic because besides promoting intense schism, it inevitably 
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leads disciplines to operate within a hierarchical, competitive structure in 
which they are far from equal. For example, Liu’s critique mirrors the 
persistent dichotomy of science vs humanities: due to the lack of cultural 
criticism—typical of the sciences but not of the humanities—DH is not 
humanities at all. DH may be instrumental to the humanities (i.e., the 
humanities is superior to DH but inferior to the sciences), but it is reduced 
to a servant role. Hence, if typical descriptions of DH as a space in which 
the two worlds—the sciences and the humanities—‘meld’ seem to initially 
suggest a harmonious and egalitarian coexistence, in reality the way this 
relationship is interplayed is anything but. 

The third contradiction refers to what Berry and Fagerjord (2017) 
(among others) point out in reference to the digital transformation of 
society that ‘The question of whether something is or is not “digital” 
will be increasingly secondary as many forms of culture become mediated, 
produced, accessed, distributed or consumed through digital devices and 
technologies” (13). Humanists, they claim, must relinquish any com- 
parative notion of digital vs analogue as this contrast ‘no longer makes 
sense’ (ibid., 28). What humanists need to do instead, they continue, is 
to reflect critically on the computational and on the ramifications of the 
computational in a dedicated space which, like Grimshaw and Dobson, they 
also suggest calling CDH, thus circling back to the second contradiction. 
If the humanities are critical and if the distinction between digital and 
analogue ‘no longer makes sense’, then by insisting on establishing a 
CDH, they fail to transcend the very same distinction between digital and 
analogue they claim it to be nonsensical. 

While I see the validity and truth in the debates that have animated past 
DH scholarship, I also argue that the reason for these inconsistencies is to 
be found in the specific model of knowledge within which these scholars 
still operate: a model in which knowledge is divided into competing 
disciplines. Behind the pushes to relinquish ideas of divisions and embrace 
the digital is a persistent disciplinary structure of knowledge which, despite 
the declared novelty, is bound to the epistemology of the last century. 
Instead, I maintain, we should not accommodate the digital within the 
existing disciplinary structure as it is the structure of knowledge itself and 
its conceptualisation into separate fields and worldviews that has to change. 
The current model of knowledge creation, grounded in division and 
competition, is unequipped to explain the complexities of the world and 
the 2020 pandemic has magnified the urgency of adopting a strong critical 
stance on the digital transformation of society. This cannot happen through 


26 L.VIOLA 


the creation of niche fields, let alone exclusively within the humanities, but 
through a reconceptualisation of knowledge creation itself. 

The post-authentic framework that I propose in this book moves beyond 
the existing breakdown of disciplines which I see as not only unhelpful 
and conceptually limiting but also harmful. The main argument of this 
book is that it is no longer solely the question of how the digital affects 
the humanities but how knowledge creation more broadly happens in 
the digital. Thinking in terms of yet another field (e.g., CDH) where 
supposedly computational science and critical enquiry would meet in this 
or that modulation, for this or that goal, still reiterates the same bound- 
aries that hinder that enquiry. Similarly, claiming that DH scholarship 
conducts digital enquiry suggests that humanities scholarship does not 
happen in the digital and therefore it continually reproduces the outmoded 
distinction between digital and analogue as well as the dichotomy between 
digital /non-critical and non-digital/critical. Conversely, calls for a CDH 
presuppose that DH is never critical (or worse, that it cannot be critical 
at all) and that the humanities can (should?) continue to defer their 
appointment with the digital, and disregard any matter of concern that 
has to do with it, ultimately implying that to remain unconcerned by the 
digital is still possible. 

But the digital affects us all, including (perhaps especially) those who 
do not have access to it. The digital transformation exacerbates the already 
existing inequalities in society as those who are the most vulnerable such 
as migrants, refugees, internally displaced persons, older persons, young 
people, children, women, persons with disabilities, rural populations and 
indigenous peoples are disproportionately affected by the lack of digital 
access. The digital lens provided by the 2020 pandemic has therefore 
magnified the inequality and unfairness that are deeply rooted in our 
societies. In this respect, for example, on 18 July 2020, UN Secretary- 
General Antonio Guterres declared (United Nations 2020a): 


COVID-19 has been likened to an x-ray, revealing fractures in the fragile 
skeleton of the societies we have built. It is exposing fallacies and falsehoods 
everywhere: the lie that free markets can deliver healthcare for all; the fiction 
that unpaid care work is not work; the delusion that we live in a post-racist 
world; the myth that we are all in the same boat. While we are all floating on 
the same sea, it’s clear that some are in super yachts, while others are clinging 
to the drifting debris. 
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The post-authentic framework that I propose in this book is a con- 
ceptual framework for knowledge creation in the digital; it rejects the 
view of the digital as crossing paths with disciplines, intersecting, melting, 
merging, meeting or any other verb that suggests that separate entities 
are converging but which leave the model of knowledge essentially unaf- 
fected. I maintain that this sort of worldview is obsolete, even dangerous; 
researchers can no longer justify statements such as ‘I’m not digital’ as 
we are all zw the digital. But rather than seeing this transformation as a 
threat, some sort of bleak reality in which critical thinking no longer has a 
voice and everything is automated, I see it as an opportunity for change of 
historic proportion. Any process of transformation fundamentally changes 
all the parts involved; if we accept the notion of digital transformation 
with regard to society, we also have to acknowledge that as much as the 
digital transforms society, the way society produces knowledge must also 
be transformed. This entails acknowledging the unsuitability of current 
frameworks of knowledge creation for understanding the deep implications 
of technology on culture and knowledge and for meeting the world 
challenges complexified by the digital. This book wants to signal how the 
digital acceleration brought by the 2020 events now adds new urgency to 
an issue already identified by scholars some twenty years ago but that now 
cannot be procrastinated any further. Hall for instance argued (2002, 128): 


We cannot rely merely on the modern “disciplinary” methods and frame- 
works of knowledge in order to think and interpret the transformative effect 
new technology is having on our culture, since it is precisely these methods 
and frameworks that new technology requires us to rethink. 


I therefore suggest we stop using the term ‘interdisciplinarity’ alto- 
gether. As it contains the word discipline, albeit in reference to breaking, 
crossing, transcending disciplines’ boundaries and all the other usual 
suspects that typically recur in interdisciplinarity discourse, I believe that 
the term continues to refer to the exact same notions of knowledge com- 
partmentalisation that the digital transformation requires us to relinquish. 
In my view, thinking in these terms is not helpful and does not adequately 
respond to the consequences of the digital transformation that society, 
higher education and research have undertaken. Based on separateness 
and individualism, the current model of knowledge creation restricts our 
ability to identify and access the various complexities of reality. Traditional 
binary views of deep/significant vs superficial /trivial, digital /non-critical 
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vs non-digital/critical and the sciences vs the humanities may appear firm, 
but only because we exaggerate their fixity. Similarly, the separation into 
disciplines may seem inevitable and fixed, but in reality the majority of 
norms and views are arbitrary, neither unavoidable nor final and, therefore, 
completely alterable. Weingart, for instance, states (Weingart 2000, 39): 


The structures are by no means fixed and irreplaceable, but they are social 
constructs, products of long and complex social interactions, subject to social 
processes that involve vested interests, argumentation, modes of conviction, 
and differential perceptions and communications. 


With specific reference to the current model of knowledge creation, for 
example, Stichweh (2001) reminds us that the organisation of universities 
in academic departments is rather a recent phenomenon, ‘an invention 
of nineteenth century society’ (13727); in fact, to paraphrase McKeon, 
the apparently monolithic integrity of disciplines as we know them may 
sometimes obscure a radically disparate and interdisciplinary core (1994). 
The argument I reiterate in this book is that the current landscape requires 
us to move from this model, beyond (not away from) thick description of 
single-discipline case studies, and to recognise not only that knowledge is 
much more fluid than we are accustomed to think, but also that the digital 
transcends artificial discipline boundaries. 

In the chapters that follow, I take an auto-ethnographic and self-reflexive 
approach to show how the application of the post-authentic framework that 
I have developed has informed my practice as a humanist i# the digital. 
More broadly, I show how the framework can guide a conceptualisation 
of knowledge creation that transcends discipline boundaries, especially 
digital vs non-digital positions. Thinking in terms of in the digital—and 
no longer and the digital—thus bears enormous potential for tangibly 
undisciplining knowledge, for introducing counter-narratives in the digital 
capitalistic discourse, for developing, encouraging and spreading a digital 
conscience and for taking an active part in the re-imagination of post- 
authentic higher education and research. The world has entered a new 
dimension in which knowledge can no longer afford to see technology 
and its production simply as instrumental and contextual or as an object of 
critique, admiration, fear or envy. In my view, the current landscape is much 
more complex and has now much wider implications than those identified 
so far. In this book, I want to elaborate on them, not with the purpose of 
rejecting previous positions but to provide additional perspectives which 


1 THE HUMANITIES IN THE DIGITAL 29 


I think are urgently required especially as a consequence of the 2020 
pandemic. 

In what still is predominantly a binary conceptual framework, e.g., 
the sciences vs the humanities, the humanities vs DH and DH vs CDH, 
this book provides a third way: knowledge creation in the digital. The 
book argues that the new paradigm shift ¿n the digital—as opposed to 
towards—accelerated considerably by the COVID-19 pandemic positions 
knowledge creation beyond such outdated dichotomous conceptualisa- 
tions. We develop technology at a blistering pace, but so does our capacity 
to misuse it, abuse it and do harm. It is therefore everyone’s duty to 
argue against any claimed computational neutrality but more importantly 
to relinquish outmoded and rather presumptuous perspectives that grant 
solely to humanists the moral monopoly right to criticise and critique. 
Indeed, as we are all in the digital, critical engagement cannot afford 
to remain limited exclusively to a handful of scholars who may or may 
not have interest in practice-led digital research—but who are in the 
digital nevertheless—as this would tragically create more fragmentation, 
polarisation and ultimately harm. 

This is not a book about CDH, neither is it a book about DH, nor is it 
about the digital and the humanities or the digital in the humanities. What 
this book is about is knowledge zm the digital. 


1.4 OH, THE PLACES YOU’LL GO! 


The digital transformation of society—and therefore of academia and 
of knowledge creation more generally—will not be stopped, let alone 
reversed. The claim I advance in this book is that, whilst a great deal of talk 
has so far revolved around the impact of the digital on individual fields, 
how the model of knowledge creation should be transformed accordingly 
has largely been overlooked. I argue that the increasing complexity of the 
world brought about by the digital transformation now demands a new 
model of knowledge to understand, explain and respond to the reality 
of ubiquitous digital data, algorithmic automated processes, computa- 
tional infrastructures, digital platforms and digital objects. I contend that 
such engagement should not unfold as coming from a place of criticism 
per se but that it should be seized as a historic opportunity for truly 
decompartmentalising knowledge and reconfiguring the way we think 
about it. A decompartmentalised model of knowledge does not denature 
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disciplines but it breaks the current opposing, hierarchical structure in 
which disciplines still operate. The digital transformation finally forces us 
to go back to the fundamental questions: how do we create knowledge and 
how do we want to train our next generation of students? 

Be it in the form of data, platforms, infrastructures or tools, across 
the humanities, scholars have pointed out the interfering nature of the 
digital at different levels and have called for a reconfiguration of research 
practice conceptualisations (e.g., Cameron and Kenderdine 2007; Drucker 
2011, 2020; Braidotti 2019; Cameron 2021; Fickers 2022). Fickers, 
for instance, proposes digital hermeneutics as a helpful framework to 
address both the archival and historiographical issues ‘raised by changing 
logics of storage, new heuristics of retrieval, and methods of analysis and 
interpretation of digitized data’ (2020, 161). In this sense, the digital 
hermeneutics framework combines critical reflection on historical practice 
as well as digital literacy, for instance by embedding digital source criticism, 
a reflection on the consequences for the epistemology of history of the 
transformation from sources to data through digitisation. 

With specific reference to cultural heritage concepts and their relation 
to the digital, Cameron (2007; 2021) refigures digital cultural heritage 
curation practices and digital museology by problematising digital cul- 
tural heritage as societal data, entities with their own forms of agency, 
intelligence and cognition (Cameron 2021). By reflecting on the wider 
consequences of the digital on heritage for future generations includ- 
ing Western perspectives, climate change, environmental destruction and 
injustice, the scholar proposes a more-than-human digital museology 
framework which recognises the impact of AI, automated systems and 
infrastructures as part of a wider ecology of components in digital cultural 
heritage practices. 

On the mediating role of the digital for the visual representation of mate- 
rial destined to humanistic enquiry, Drucker (2004; 2011; 2013; 2014; 
2020) has also long advocated a critical stance and a more problematised 
approach. She has, for example proposed alternative ways of visualising 
digital material that would expose rather than hiding the different stages 
of mediation, interpretation, selection and categorisation that typically 
disappear in the final graphical display. Her work introduces an important 
counter-narrative in the public and academic discourse which predom- 
inantly exalts data, computational processes and digital visualisations as 
unarguable and exact. 
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These contributions are all unmistakable signs of the decreasing rele- 
vance of the current model of knowledge production following the digital 
transformation of society and of the fact that the notion that the digital is 
something that ‘happens’ to knowledge creation is entirely anachronistic 
now. At the same time, however, these past approaches insist on disciplinary 
competence and indeed are modulated primarily within the fields and for 
the disciplines they originate from (e.g., digital history, digital cultural 
heritage, the humanities). The post-authentic framework that I propose 
here attempts to break with the ‘paradox of interdisciplinarity’ in relation 
to the digital, for which knowledge is not truly undisciplined but the digital 
is incorporated in existing fields and creates yet new fields, hence new 
boundaries. The post-authentic framework incorporates all these recent 
perspectives but at the same time it goes beyond them; as it intentionally 
refers to digital objects rather than to the disciplines within which they 
are created, it provides an architecture for issues such as transparency, 
replicability, Open Access, sustainability, accountability and visual display 
with no specific reference to any discipline. 

I build my argument for advocating the post-authentic framework to 
digital knowledge creation and digital objects upon recent theories of 
critical posthumanities (Braidotti 2017; Braidotti and Fuller 2019). In 
recognising that current terminologies and methods for posthuman knowl- 
edge production are inadequate, critical posthumanities offers a more 
holistic perspective on knowledge creation, and it is therefore particularly 
relevant to the argument I advance in this book. With specific reference 
to the need for novel notions that may guide a reconceptualisation of 
knowledge creation, Braidotti and Fuller (Braidotti 2017; Braidotti and 
Fuller 2019) propose Transversal Posthumanities, a theoretical framework 
for the Critical Posthumanities. With this framework, they introduce the 
concept of transversality, a term borrowed from geometry that refers to 
the understanding of spaces in terms of their intersection (Braidotti and 
Fuller 2019, 1). Although the main argument I advance in this book is 
also that of an urgent need for knowledge reconfiguration, I maintain 
that transversality still suggests a view of knowledge as solid and thus 
it only partially breaks with the outdated conceptualisation of discipline 
compartmentalisation that aims to relinquish. To actualise a remodelling of 
knowledge, I introduce two concepts: symbiosis and mutualism. In Chap. 2, 
I explain how the notion of symbiosis—from Greek ‘living together’— 
embeds in itself the principle of knowledge as fluid and inseparable. 
Similarly, borrowed from biology, the notion of mutualism proposes that 
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areas of knowledge do not compete against each other but benefit from a 
mutually compensating relationship. Building on the notion of monism in 
posthuman theory (Braidotti and Fuller 2019, 16) (cfr. Sect. 1.2) in which 
differences are not denied but which at the same time do not function 
hierarchically, symbiosis and mutualism help refigure our understanding 
of knowledge creation not as a space of conflict and competition but as a 
space of fluid interactions in which differences are understood as mutually 
enriching. 

Symbiosis and mutualism are central concepts of the post-authentic 
framework that I propose in this book, a theoretical framework for knowl- 
edge creation żin the digital. If collaboration across areas of knowledge has 
so far been largely an option, often motivated more by a grant-seeking 
logic than by genuine curiosity, the digital calls for an actual change in 
knowledge culture. The question we should ask ourselves is not ‘How can 
we collaborate?’ but ‘How can we contribute to each other?’. Concepts such 
as those of symbiosis and mutualism could equally inform our answer when 
asking ourselves the question ‘How do we want to create knowledge and 
how do we want to train our next generation of students?’. 

To answer this question, the post-authentic framework starts by recon- 
ceptualising digital objects as much more complex entities than just 
collections of data points. Digital objects are understood as the conflation 
of humans, entities and processes connected to each other according to 
the various forms of power embedded in computational processes and 
beyond and which therefore bear consequences (Cameron 2021). As 
such, digital objects transcend traditional questions of authenticity because 
digital objects are never finished nor they can be finished. Countless 
versions can continuously be created through processes that are shaped 
by past actions and in turn shape the following ones. Thus, in the post- 
authentic framework, the emphasis is on both products and processes 
which are acknowledged as never neutral and as incorporating external, 
situated systems of interpretation and management. Specifically, I take 
digitised cultural heritage material as an illustrative case of a digital object 
and I demonstrate how the post-authentic framework can be applied to 
knowledge creation in the digital. Throughout the chapters of this book, 
I devote specific attention to four key aspects of knowledge creation in 
the digital: creation of digital material in Chap. 2, enrichment of digital 
material in Chap. 3, analysis of digital material in Chap. 4, and visualisation 
of digital material in Chap. 5. 
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The second content chapter, Chap. 3, focuses on the application of 
the post-authentic framework to the task of enriching digital material; 
I use DeXTER — DeepteXTminER* and ChroniclItaly 3.0 (Viola and 
Fiscarelli 2021a) as case examples. DeXTER is a workflow that implements 
deep learning techniques to contextually augment digital textual material; 
Chroniclitaly 3.0 is a digital heritage collection of Italian American news- 
papers published in the United States between 1898 and 1936. In the 
chapter, I show how symbiosis and mutualism have guided each action of 
DeXTER’s enrichment workflow, from pre-processing to data augmenta- 
tion. My aim is to exemplify how the post-authentic framework can guide 
interaction with the digital not as a strategic (grant-oriented) or instrumen- 
tal (task-oriented) collaboration but as a cognitive mutual contribution. I 
end the chapter arguing that the task of augmenting information of cultural 
heritage material holds the responsibility of building a source of knowledge 
for current and future generations. In particular, the use of methods such 
as named entity recognition (NER), geolocation, and sentiment analysis 
(SA) requires a thorough understanding of the assumptions behind these 
techniques, constant update and critical supervision. In the chapter, I 
specifically discuss the ambiguities and uncertainties of these methods and I 
show how the post-authentic framework can help address these challenges. 

In Chap. 4, I illustrate how the post-authentic framework can be applied 
to the analysis of a digital object through the example of topic modelling, 
a distant reading method born in computer science and widely used in the 
humanities to mine large textual repositories. In particular, I highlight how 
through the deep understanding of the assemblage of culture and technol- 
ogy in the software, the post-authentic framework can guide us towards 
exploring, questioning and challenging the interpretative potential of com- 
putation. Drawing on the mathematical concepts of discrete vs continuous 
modelling of information, in the chapter I reflect on the implications 
for knowledge creation of the transformation of continuous material into 
discrete form, binary sequences of Os and 1s, and I especially focus on the 
notions of causality and correlations. I then illustrate the example of topic 
modelling as a computational technique that treats continuous material 
such as a collection of texts as discrete data. I bring critical attention to 
problematic aspects of topic modelling that are highly dependent on the 
sources: pre-processing, corpus preparation and deciding on the number of 
topics. The topic modelling example ultimately shows how post-authentic 
knowledge creation can be achieved through a sustained engagement with 
software, also in the form of a continuous exchange between processes 
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and sources. Guided by symbiosis and mutualism, such dialogue maintains 
the interconnection between two parallel goals: output—any processed 
information—and outcome, the value resulting from the output (Patton 
2015). 

Operating within the post-authentic framework crucially means 
acknowledging digital objects as having far-reaching, unpredictable 
consequences; as the complex pattern of interrelationships among 
processes and actors continually changes, interventions and processes 
must always be critically supervised. One such process is the provision of 
access to digital material through visualisation. In Chap.5, I argue that 
the post-authentic framework can help highlight the intrinsic dynamic, 
situated, interpreted and partial nature of computational processes and 
digital objects. Thus, whilst appreciating the benefits of visualising digital 
material, the framework rejects an uncritical adoption of digital methods 
and it opposes the main discourse that still presents graphical techniques 
and outputs as exact, final, unbiased and true. In the chapter, I illustrate 
how the post-authentic framework can be applied to the visualisation of 
cultural heritage material by discussing two examples: efforts towards the 
development of a user interface (UI) for topic modelling and the design 
choices for developing the app DeXTER, the interactive visualisation 
interface that explores ChroniclItaly 3.0. Specifically, I present work 
done towards visualising the ambiguities and uncertainties of topic 
modelling, network analysis (NA) and SA, and I show how key concepts 
and methods of the post-authentic framework can be applied to digital 
knowledge visualisation practices. I centre my argumentation on how the 
acknowledgement of curatorial practices as manipulative interventions can 
be encoded in the interface. I end the discussion by arguing that it is in 
fact through the interface display of the ambiguities and uncertainties of 
these methods that the active and critical participation of the researcher 
is acknowledged as required, keeping digital knowledge honest and 
accountable. 

In the final chapter, Chap. 6, I review the main formulations of this book 
project and I retrace the key concepts and values at the foundation of the 
post-authentic framework proposed here. I end the chapter with a few 
additional propositions for remodelling the process of digital knowledge 
production that could be adopted to inform the restructurin of academic 
and higher education programmes. 
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NOTES 


1. The Gartner Hype Cycle of technology is a cycle model that explains a 
generally applicable path a technology takes in terms of expectations. It states 
that after the initial, overly positive reception follows a “Trough of Disillu- 
sionment’ during which the hype collapses due to disappointed expectations. 
Some technologies manage to then climb the ‘Slope of Enlightenment’ to 
eventually plateau to a status of steady productivity. 

2. This is not to mistake for the Amazon Mechanical Turk which is a crowd- 
sourcing website that facilitates the remote hiring of ‘crowdworkers’ to 
perform on-demand tasks that cannot be handled by computers. It is 
operated under Amazon Web Services and is owned by Amazon. 

3. Compound annual growth rate. 

4. https://github.com /lorellav/DeXTER- DeepTextMiner. 
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CHAPTER 2 


The Importance of Being Digital 


A perspective is by nature limited. It offers us one single vision ofa landscape. 
Only when complementary views of the same reality combine are we capable 
of achieving fuller access to the knowledge of things. The more complex 
the object we are attempting to apprehend, the more important it is to 
have different sets of eyes, so that these rays of light converge and we can 
see the One through the many. That is the nature of true vision: it brings 
together already known points of view and shows others hitherto unknown, 
allowing us to understand that all are, in actuality, part of the same thing. 
(Grothendieck 1986) 


2.1 AUTHENTICITY, COMPLETENESS AND THE 
DIGITAL 


For the past twenty years, digital tools, technologies and infrastructures 
have played an increasingly determining role in framing how digital objects 
are understood, preserved, managed, maintained and shared. Even in tra- 
ditionally object-centred sectors such as cultural heritage, digitisation has 
become the norm: heritage institutions such as archives, libraries, museums 
and galleries continuously digitise huge quantities of heritage material. The 
most official indication of this shift towards the digital in cultural heritage is 
perhaps provided by UNESCO which, in 2003, recognised that the world’s 
documentary heritage was increasingly produced, distributed, accessed 
and maintained in digital form; accordingly, it proclaimed digital heritage 
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as common heritage (UNESCO 2003). Unsurprisingly yet significantly, 
the acknowledgement was made in the context of endangered heritage, 
including digital, whose conservation and protection must be considered 
‘an urgent issue of worldwide concern’ (ibid.). 

The document also officially distinguished between heritage created 
digitally (from then on referred to as digitally born heritage), that is, 
heritage for which no other format but the digital object exists, and 
digitised heritage, heritage ‘converted into digital form from existing 
analogue resources’ (UNESCO 2003). Therefore, as per heritage tradition, 
the semantic motivation behind digitisation was that of preserving cultural 
resources from feared deterioration or forever disappearance. It has been 
argued, however, that by distinguishing between the two types of digital 
heritage, the UNESCO statement de facto framed the digitisation process 
as a heritagising operation in itself (Cameron 2021). Consequently, to 
the classic cultural heritage paradigm ‘preserved heritage = heritage worth 
preserving’, UNESCO added another layer of complexity: the equation 
‘digitised = preserved’ (ibid.). 

UNESCO’s acknowledgement of digital heritage and in particular of 
digitised heritage as common heritage has undoubtedly had profound 
implications for our understanding of heritage practices, material culture 
and preservation. For example, by officially introducing the digital in 
relation to heritage, UNESCO’s statement deeply affected traditional 
notions of authenticity, originality, permanent preservation and complete- 
ness which have historically been central to heritage conceptualisations. 
For the purposes of this book, I will simplify the discussion! by saying 
that more traditional positions have insisted on the intrinsic lack of 
authority of copies, what Benjamin famously called the ‘aura’ of an object 
(Benjamin 1939). Museums’ culture has conventionally revolved around 
these traditional, rigid rules of originality and authenticity, established 
as the values legitimising them as the only accredited custodians of true 
knowledge. Historically, such understanding of heritage has sadly gone 
hand in hand with a very specific discourse, the one dominated by Western 
perspectives. These views have been based on ideas of old, grandiose sites 
and objects as being the sole heritage worthy of preservation which have 
in turn perpetuated Western narratives of nation, class and science (ACHS 
2012). 

More recent scholarship, however, has moved away from such object- 
centred views and reworked conventional conceptualisations of authentic- 
ity and completeness in relation to the digital (see for instance, Council 


2 THE IMPORTANCE OF BEING DIGITAL 39 


on Library and Information Resources 2000; Jones et al. 2018; Gori- 
unova 2019; Zuanni 2020; Cameron 2021; Fickers 2021). From the 
1980s onwards, for example, the influence wielded by postmodernism 
and post-colonialism theories has challenged these traditional frameworks 
and brought new perspectives for the conceptualisation of material culture 
(see for instance, Tilley 1989; Vergo 1989). The idea key to this new 
approach particularly relevant to the arguments advanced in this book is 
that material culture does not intrinsically possess any meanings; instead, 
meanings are ascribed to material culture when interpreting it in the 
present. As Christopher Y. Tilley famously stated, “The meaning of the past 
does not reside in the past, but belongs in the present’ (Tilley 1989, 192). 
According to this perspective, the significance of material culture is not 
eternal and absolute but continually negotiated in a dialectical relationship 
with contemporary values and interactions. For example, in disciplines such 
as museum studies, this view takes the form of a critique of the social and 
political role of heritage institutions. Through this lens, museums are not 
seen as neutral custodians of material culture but as grounded in Western 
ideologies of elitism and power and representing the interests of only a 
minority of the population (Vergo 1989). 

Such considerations have led to the emergence of new disciplines such 
as Critical Heritage Studies (CHS). In CHS, heritage is understood as a 
continuous negotiation of past and present modularities in the acknowl- 
edgement that heritage values are not fixed nor universal, rather they 
are culturally situated and constantly co-constructed (Harrison 2013). 
Though still aimed at preserving and managing heritage for future genera- 
tions, CHS are resolutely concerned with questions of power, inequality 
and exploitation (Hall 1999; Butler 2007; Winter 2011) thus showing 
much of the same foci of interest as critical posthumanities (Braidotti 2019) 
and perfectly intersecting with the post-authentic framework I propose in 
this book. 

The official introduction of the digital in the context of cultural heritage 
has necessarily become intertwined with the political and ideological legacy 
concerning traditional notions of original and authentic vs copies and 
reproductions. Simplistically seen as mere immaterial copies of the original, 
digital objects could not but severely disrupt these fundamental values, in 
some cases going as far as being framed as ‘terrorists’ (Cameron 2007, 51), 
that is destabilising instruments of what is true and real. In an effort to 
defend material authenticity as the sole element defining meaning, digital 
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artefacts were at best bestowed an inferior status in comparison to the 
originals, a servant role to the real. 

The parallel with DH vs ‘mainstream humanities’ is hard to miss (cfr. 
Chap. 1). In 2012, Alan Liu had defined DH as ‘ancillary’ to mainstream 
humanities (Liu 2012), whereas others (Allington et al. 2016; Brennan 
2017, e.g.,) had claimed that by incorporating the digital into the human- 
ities, its very essence, namely agency and criticality, was violated, one 
might say polluted. In opposition to the analogue, the digital was seen 
as an immaterial, agentless and untrue threatening entity undermining the 
authority of the original. Similar to digital heritage objects, these criticisms 
of DH did not problematise the digital but simplistically reduced it to a 
non-human, uncritical entity. 

Nowadays, this view is increasingly challenged by new conceptual 
dimensions of the digital; for instance Jones et al. (2018) argue that ‘a 
preoccupation with the virtual object—and the binary question of whether 
it is or is not authentic—obscures the wider work that digital objects 
do’ (Jones et al. 2018, 350). Similarly, in her exploration of the digital 
subject, Olga Goriunova (2019) reworks the notion of distance in Valla 
and Benenson’s artwork in which a digital artefact is described as ‘neither 
an object nor its representation but a distance between the two’ (2014). 
Far from being a blank void, this distance is described as a ‘thick’ space in 
which humans, entities and processes are connected to each other (ibid., 
4) according to the various forms of power embedded in computational 
processes. According to this view, the concept of authenticity is considered 
in relation to the digital subject, i.e., the digital self, which is rethought 
as a much more complex entity than just a collection of data points and 
at the same time, not quite a mere extension of the self. More recently, 
Cameron (2021) states that in the context of digital cultural heritage, the 
very conceptualisation of a digital object escapes Western ideas of curation 
practices, and authenticity ‘may not even be something to aspire to’ (15). 

This chapter wants to expand on these recent positions, not because I 
disagree with the concepts and themes expressed by these authors, but 
because I want to add a novel reflection on digital objects, including 
digital heritage, and on both theory and practice-oriented aspects of 
digital knowledge creation more widely. I argue that such aspects are in 
urgent need of reframing not solely in museum and gallery practices, and 
heritage policy and management, but crucially also in any context of digital 
knowledge production and dissemination where an outmoded framework 
of discipline compartmentalisation persists. Taking digital cultural heritage 
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as an illustrative case of a digital object typical of humanities scholarship, I 
devote specific attention to the way in which digitisation has been framed 
and understood and to the wider consequences for our understanding of 
heritage, memory and knowledge. 


2.2 DIGITAL CONSEQUENCES 


This book challenges traditional notions of authenticity by arguing for 
a reconceptualisation of the digital as an organic entity embedding past, 
present and future experiences which are continuously renegotiated during 
any digital task (Cameron 2021). Specifically, I expand on what Cameron 
calls the ‘ecological composition concept’ (ibid., 15) in reference to digital 
cultural heritage curation practices to include any action in a digital 
setting, also understood as bearing context and therefore consequences. 
She argues that the act of digitisation does not merely produce immaterial 
copies of their analogue counterparts—as implied by the 2003 UNESCO 
statement with reference to digitised cultural heritage—but by creating 
digital objects, it creates new things which in turn become alive, and 
which therefore are themselves subject to renegotiation. I further argue 
that any digital operation is equally situated, never neutral as each in turn 
incorporates external, situated systems of interpretation and management. 
For example, the digitisation of cultural heritage has been discursively 
legitimised as a heritigising operation, i.e., an act of preservation of cultural 
resources from deterioration or disappearance. Though certainly true to an 
extent, preservation is only one of the many aspects linked to digitisation 
and by far not the only reason why governments and institutions have 
started to invest massively in it. In line with the wider benefits that digiti- 
sation is thought to bring at large (cfr. Chap. 1), the digitisation of cultural 
heritage is believed to serve a range of other more strategic goals such as 
fuelling innovation, creating employment opportunities, boosting tourism 
and enhancing visibility of cultural sites including museums, libraries and 
archives, all together leading to economic growth (European Commission 
2011). 

Inevitably, the process of cultural heritage digitisation itself has therefore 
become intertwined with questions of power, economic interests, ideolog- 
ical struggles and selection biases. For instance, after about two decades of 
major, large-scale investments in the digitisation of cultural heritage, self- 
reported data from cultural heritage institutions indicate that in Europe, 
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only about 20% of heritage material exists in a digital format (Enumerate 
Observatory 2017), whereas globally, this percentage is believed to remain 
at 15%.? Behind these percentages, it is very hard not to see the colonial 
ghosts from the past. CHS have problematised heritage designation not 
just as a magnanimous act of preserving the past, but as ‘a symbol of 
previous societies and cultures’ (Evans 2003, 334). When deciding which 
societies and whose cultures, political and economic interests, power 
relations and selection biases are never far away. For example, particularly 
in the first stages of large-scale mass digitisation projects, special collections 
often became the prioritised material to be digitised (Rumsey and Digital 
Library Federation 2001), whereas less mainstream works and minority 
voices tended to be largely excluded. Typically, libraries needed to decide 
what to digitise based on cost-effective analyses and so their choices were 
often skewed by economic imperatives rather than ‘actual scholarly value’ 
(Rumsey and Digital Library Federation 2001). The UNESCO-induced 
paradigm ‘digitising = preserving’ contributed to communicate the idea 
that any digitised material was intrinsically worth preserving, thus in 
turn perpetuating previous decisions about what had been worth keeping 
(Crymble 2021). 

There is no doubt that today’s under-representation of minority voices 
in digital collections directly mirrors decades of past decisions about what 
to collect and preserve (Lee 2020). In reference to early US digitisation 
programmes, for example, Rumsey Smith points out that as a direct 
consequence of this reasoning: 


foreign language materials are nearly always excluded from consideration, 
even if they are of high research value, because of the limitations of optical 
character recognition (OCR) software and because they often have a limited 
number of users. (Rumsey and Digital Library Federation 2001, 6) 


This has in turn had other repercussions. As most of the digitised 
material has been in English, tools and software for exploring and analysing 
the past have primarily been developed for the English language. Although 
in recent years greater awareness around issues of power, archival biases, 
silences in the archives and lack of language diversity within the context of 
digitisation has certainly developed not just in archival and heritage studies, 
but also in DH and digital history (see for instance, Risam 2015; Putnam 
2016; Earhart 2019; Mandell 2019; McPherson 2019; Noble 2019), the 
fact remains that most of that 15% is the sad reflection of this bitter legacy. 
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Another example of the situated nature of digitisation is microfilming. 
In his famous investigative book Double Fold, Nicholson Baker (2002) doc- 
uments in detail the contextual, economic and political factors surrounding 
microfilming practices in the United States. Through a zealous investiga- 
tion, he tells us a story involving microfilm lobbyists, former CIA agents 
and the destruction of hundreds of thousands of historical newspapers. 
He pointedly questions the choices of high-profile figures in American 
librarianship such as Patricia Battin, previous Head Librarian of Columbia 
University and the head of the American Commission on Preservation 
and Access from 1987 to 1994. From the analysis of government records 
and interviews with persons of interest, Baker argues that Battin and the 
Commission pitched the mass digitisation of paper records to charitable 
foundations and the American government by inventing the ‘brittle book 
crisis’, the apparent rapid deterioration that was destroying millions of 
books across America (McNally 2002). In reality, he maintains, her con- 
vincing was part of an agenda to provide content for the microfilming 
technology. 

In advocating for preservation, Baker also discusses the limitations of 
digitisation and some specific issues with microfilming, such as loss of 
colour and quality and grayscale saturation. Such issues have had over 
the years unpredictable consequences, particularly for images. In historical 
newspapers, some images used to be printed through a technique called 
rotogravure, a type of intaglio printing known for its good quality image 
reproduction and especially well-suited for capturing details of dark tones. 
Scholars (i.e., Williams 2019; Lee 2020) have pointed out how the 
grayscale saturation issue of microfilming directly affects images of Black 
people as it distorts facial features by achromatising the nuances. In the 
case of millions and millions of records of images digitised from microfilm 
holdings, such as the 1.56 million images in the Library of Congress’ 
Chronicling America collection, it has been argued that the microfilming 
process itself has acted as a form of oppression for communities of colour 
(Williams 2019). This together with several other criticisms concerning 
selection biases have led some authors to talk about Chronicling White 
America (Fagan 2016). 

In this book I argue in favour of a more problematised conceptualisation 
of digital objects and digital knowledge creation as living entities that bear 
consequences. To build my argument, I draw upon posthuman critical the- 
ory which understands the matter as an extremely convoluted assemblage 
of components, ‘complex singularities relate[d] to a multiplicity of forces, 
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entities, and encounters’(Braidotti 2017, 16). Indeed, for its deconstruct- 
ing and disruptive take, I believe the application of posthumanities theories 
has great potential for refiguring traditional humanist forms of knowledge. 
Although I discuss examples of my own research based on digital cultural 
heritage material, my aim is to offer a counter-narrative beyond cultural 
heritage and with respect to the digitisation of society. My intention is to 
challenge the dominant public discourse that continues to depict the digital 
as non-human, agentless, non-authentic and contextless and by extension 
digital knowledge as necessarily non-human, cultureless and bias-free. The 
digitisation of society sharply accelerated by the COVID-19 pandemic has 
added complexity to reality, precipitating processes that have triggered 
reactions with unpredictable, potentially global consequences. I therefore 
maintain that with respect to digital objects, digital operations and to the 
way in which we use digital objects to create knowledge, it is the notion of 
the digital itself that needs reframing. In the next section, I introduce the 
two concepts that may inform such radical reconfiguration: symbiosis and 
mutualism. 


2.3 SYMBIOSIS, MUTUALISM AND THE DIGITAL 
OBJECT 


This book recognises the inadequacy of the traditional model of knowledge 
creation, but it also contends that the 2020 pandemic-induced pervasive 
digitisation has added further urgency to the point that this change can 
no longer be deferred. Such re-figured model, I argue, must conceptualise 
the digital object as an organic, dynamic entity which lives and evolves 
and bears consequences. It is precisely the unpredictability and long-term 
nature of these consequences that now pose extremely complex questions 
which the current rigid, single discipline-based model of knowledge cre- 
ation is ill-equipped to approach.’ This book is therefore an invitation for 
institutions as well as for us as researchers and teachers to address what 
it means to produce knowledge today, to ask ourselves how we want our 
digital society to be and what our shared and collective priorities are, and 
so to finally produce the change that needs to happen. 

As a new principle that goes beyond the constraints of the canonical 
forms, posthuman critical theory has proposed transversality, ‘a pragmatic 
method to render problems multidimensional’ (Braidotti and Fuller 2019, 
1). With this notion of geometrical transversality that describes spaces ‘in 
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terms of their intersection’ (ibid., 9), posthuman critical theory attempts to 
capture ‘relations between relations’. I argue, however, that the suggested 
image of a transversal cut across entities that were previously disconnected, 
e.g., disciplines, does not convey the idea of fluid exchanges; rather, it 
remains confined in ideas of separation and interdisciplinarity and therefore 
it only partially breaks with the outdated conceptualisations of knowledge 
compartmentalisation that it aims to disrupt. The term transversality, I 
maintain, ultimately continues to frame knowledge as solid and essentially 
separated. 

This book firmly opposes notions of divisions, including a division of 
knowledge into monolithic disciplines, as they are based on models of 
reality that support individualism and separateness which in turn inevitably 
lead to conflict and competition. To support my argument of an urgent 
need for knowledge reconfiguration and for new terminologies, I propose 
to borrow the concept of symbiosis from biology. The notion of symbiosis 
from Greek ‘living together’ refers in biology to the close and long- 
term cooperation between different organisms (Sims 2021). Applied to 
knowledge remodelling and to the digital, symbiosis radically breaks with 
the current conceptualisation of knowledge as a separate, static entity, 
linear and fragmented into multiple disciplines and of the digital as an 
agentless entity. To the contrary, the term symbiosis points to the continual 
renegotiation in the digital of interactions, past, present and future systems, 
power relations, infrastructures, interventions, curations and curators, 
programmers and developers (see also Cameron 2021). 

Integral to the concept of symbiosis is that of mutualism; mutualism 
opposes interspecific competition, that is, when organisms from different 
species compete for a resource, resulting in benefiting only one of the 
individuals or populations involved (Bronstein 2015). I maintain that the 
current rigid separation in disciplines resembles an interspecific competi- 
tion dynamic as it creates the conditions for which knowledge production 
has become a space of conflict and competition. As it is not only outdated 
and inadequate but indeed deeply concerning, I therefore argue that the 
contemporary notion of knowledge should not simply be redefined but 
that it should be reconceptualised altogether. Symbiosis and mutualism 
embed in themselves the principle of knowledge as fluid and inseparable in 
which areas of knowledge do not compete against each other but benefit 
from a mutually compensating relationship. When asking ourselves the 
questions ‘How do we produce knowledge today?’, ‘How do we want our 
next generation of students to be trained?’, the concepts of symbiosis and 
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mutualism may guide the new reconfiguration of our understanding of 
knowledge in the digital. 

Symbiosis and mutualism are central notions for the development of a 
more problematised conceptualisation of digital objects and digital knowl- 
edge production. Expanding on Cameron’s critique of the conceptual 
attachment to digital cultural heritage as possessing a complete quality 
of objecthood (Cameron 2021, 14), I maintain that it is not just digital 
heritage and digital heritage practices that escape notions of completeness 
and authenticity but in fact all digital objects and all digital knowledge 
creation practices. According to this conceptualisation, any intervention 
on the digital object (e.g., an update, data augmentation interventions, 
data creation for visualisations) should always be understood as the sum of 
all the previously made and concurrent decisions, not just by the present 
curator/analyst, but by external, past actors, too (see for instance, the 
example of microfilming discussed in Sect.2.2). These decisions in turn 
shape and are shaped by all the following ones in an endless cycle that 
continually transforms and creates new object forms, all equally alive, all 
equally bearing consequences for present and future generations. This is 
what Cameron calls the ‘more-than-human’, a convergence of the human 
and the technical. 

I maintain, however, that the ‘more-than-human’ formulation still 
presupposes a lack of human agency in the technical (the supposedly non- 
human) and therefore a yet again binary view of reality. In Cameron’s view, 
the more-than-human arises from the encounter of human agency with the 
technical, which therefore would not possess agency per se. But agency 
does not uniquely emerge from the interconnections between let’s say the 
curator (what could be seen as ‘the human’) and the technical components 
(i.e., ‘the non-human’) because there is no concrete separation between the 
human and the technical and in truth, there is no such a thing as neutral 
technology (see Sect. 1.2). For example, in the practices of early large- 
scale digitisation projects, past decisions about what to (not) digitise have 
eventually led to the current English-centric predominance of data-sets, 
software libraries, training models and algorithms. Using this technology 
today contributes to reinforce Western, white worldviews not just in digital 
practices, but in society at large. 

Hence, if Cameron believes that framing digital heritage as ‘possessing 
a fundamental original, authentic form and function [...] is limiting’ 
(ibid.,12), I elaborate further and maintain that it is in fact misleading. 
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Indeed, in constituting and conceptualising digital objects, the question of 
whether it is or it is not authentic truly doesn’t make sense; digital objects 
transcend authenticity; they are post-authentic. To conceptualise digital 
objects as post-authentic means to understand them as unfinished processes 
that embed a wide net of continually negotiable relations of multiple 
internal and external actors, past, present and future experiences; it means 
to look at the human and the technical as symbiotic, non-discriminable 
elements of the digital’s immanent nature which is therefore understood as 
situated and consequential. To this end, I introduce a new framework that 
could inform practices of knowledge reconfiguration: the post-authentic 
framework. The post-authentic framework problematises digital objects 
by pointing to their aliveness, incompleteness and situatedness, to their 
entrenched power relations and digital consequences. Throughout the 
book, I will unpack key theoretical concepts of the post-authentic frame- 
work and, through the illustration of four concrete examples of knowledge 
creation in the digital—creation of digital material, enrichment of digital 
material, analysis of digital material and visualisation of digital material—I 
evaluate its full implications for knowledge creation. 


2.4 CREATION OF DIGITAL OBJECTS 


The post-authentic framework acknowledges digital objects as situated, 
unfinished processes that embed a wide net of continually negotiable 
relations of multiple actors. It is within the post-authentic framework that 
I describe the creation of ChroniclItaly 3.0 (Viola and Fiscarelli 2021a), a 
digital heritage collection of Italian American newspapers published in the 
United States by Italian immigrants between 1898 and 1936. I take the 
formation and curation of this collection as a use case to demonstrate how 
the post-authentic framework can inform the creation of a digital object 
in general, reacting to and impacting on institutional and methodological 
frameworks for knowledge creation. In the case of ChroniclItaly 3.0, this 
includes effects on the very conceptualisation of heritage and heritage 
practices. 

Being the third version of the collection, ChroniclItaly 3.0 is in itself a 
demonstration of the continuously and rapidly evolving nature of digital 
research and of the intrinsic incompleteness of digital objects. I created 
the first version of the collection, ChroniclItaly (Viola 2018) within the 
framework of the Transatlantic research project Oceanic Exchanges (OcEx) 
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(Cordell et al. 2017). OcEx explored how advances in computational 
periodicals research could help historians trace and examine patterns of 
information flow across national and linguistic boundaries in digitised 
nineteenth-century newspaper corpora. Within OcEx, our first priority 
was therefore to study how news and concepts travelled between Europe 
and the United States and how, by creating intricate entanglements of 
informational exchanges, these processes resulted in transnational linguistic 
and cultural contact phenomena. Specifically, we wanted to investigate 
how historical newspapers and Transatlantic reporting shaped social and 
cultural cohesion between Europeans in the United States and in Europe. 
One focus was specifically on the role of migrant communities as nodes 
in the Transatlantic transfer of culture and knowledge (Viola and Verheul 
2019a). As the main aim was to trace the linguistic and cultural changes 
that reflected the migratory experience of these communities, we first 
needed to obtain large quantities of diasporic newspapers that would 
be representative of the Italian ethnic press at the time. Because of the 
project’s time and costs limitations, such sources needed to be available for 
computational textual analysis, i.e., already digitised. This is why I decided 
to machine harvest the digitised Italian American newspapers from Chron- 
icling America,* the Open Access, Internet-based Library of Congress 
directory of digitised historical newspapers published in the United States 
from 1777 to 1963. Chronicling America is also an ongoing digitisation 
project which involves the National Digital Newspaper Program (NDNP), 
the National Endowment for the Humanities (NEH), and the Library of 
Congress. Started in 2005, the digitisation programme continuously adds 
new titles and issues through the funding of digitisation projects awarded 
to external institutions, mostly universities and libraries, and thus in itself 
it encapsulates the intrinsic incompleteness of digital infrastructures and 
digital objects and the far-reaching network of influencing factors and 
actors involved. 

This wider net of interrelations that influence how digital objects come 
into being and which equally influenced the ChroniclItaly collections 
can be exemplified by the criteria to receive the Chronicling America 
grant. In line with the main NDNP’s aim ‘to create a national digital 
resource of historically significant newspapers published between 1690 
and 1963, from all the states and U.S. territories’ (emphasis mine NEH 
2021, 1), institutions should digitise approximately 100,000 newspaper 
pages representing their state. How this significance is assessed depends 
on four principles. First, titles should represent the political, economic 
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and cultural history of the state or territory; second, titles recognised as 
‘papers of record’, that is containing ‘legal notices, news of state and 
regional governmental affairs, and announcements of community news 
and events’ are preferred (ibid., 2). Third, titles should cover the majority 
of the population areas, and fourth, titles with longer chronological runs 
and that have ceased publication are prioritised. Additionally, applicants 
must commit to assemble an advisory board including scholars, teachers, 
librarians and archivists to inform the selection of the newspapers to be 
digitised. The requirement that most heavily conditions which titles are 
included in Chronicling America, however, is the existence of a complete, 
or largely complete microfilm ‘object of record’ with priority given to 
higher-quality microfilms. In terms of technical requirements, this criterion 
is adopted for reasons of efficiency and cost; however, as in the past 
microfilming practices in the United States were entrenched in a complex 
web of interrelated factors (cfr. Sect.2.2), the impact of this criterion on 
the material included in the directory incorporates issues such as previous 
decisions of what was worth microfilming and more importantly, what was 
not. 

Furthermore, to ensure consistency across the diverse assortment of 
institutions involved over the years and throughout the various grant cycles, 
the programme provides awardees with further technical guidelines. At 
the same time, however, these guidelines may cause over-representation 
of larger or mainstream publications; therefore, to counterbalance this 
issue, titles that give voice to under-represented communities are highly 
encouraged. Although certainly mitigated by multiple review stages (i.e., 
by each state awardee’s advisory board, by the NEH and peer review 
experts), the very constitutional structure of Chronicling America reveals 
the far-reaching net of connections, economic and power relations, mul- 
tiple actors and factors influencing the decisions about what to digitise. 
Significantly, it exposes how digitisation processes are intertwined with 
individual institutions’ research agendas and how these may still embed 
and perpetuate past archival biases. 

The creation of ChroniclItaly therefore ‘inherits’ all these decisions 
and processes of mediation and in turn embeds new ones such as those 
stemming from the research aims of the project within which it was created, 
i.e., OcEx, and the expertise of the curator, i.e., myself. At this stage, for 
example, we decided to not intervene on the material with any enriching 
operation as ChroniclItaly mainly served as the basis for a combination 
of discourse and text analyses investigations that could help us research to 
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which extent diasporic communities functioned as nodes and contact zones 
in the Transatlantic transfer of information. 

As we explored the collection further, we realised however that to limit 
our analyses to text-based searches would not exploit the full potential of 
the archive; we therefore expanded the project with additional grant money 
earned through the Utrecht University’s Innovation Fund for Research in 
IT. We made a case for the importance of experimenting with computa- 
tional methodologies that would allow humanities scholars to identify and 
map the spatial dimension of digitised historical data as a way to access 
subjective and situational geographical markers. It is with this aim in mind 
that I created ChroniclItaly 2.0 (Viola 2019), the version of the collection 
annotated with referential entities (i.e., people, places, organisations). As 
part of this project, we also developed the app GeoNewsMiner (GNM)° 
(Viola et al. 2019). This is an interactive graphical user interface (GUI) 
to visually and interactively explore the references to geographical entities 
in the collection. Our aim was to allow users to conduct historical, finer- 
grained analyses such as examining changes in mentions of places over time 
and across titles as a way to identify the subjective and situational dimension 
of geographical markers and connect them to explicit geo-references to 
space (Viola and Verheul 2020a). 

The creation of the third version of the collection, ChroniclItaly 
3.0, should be understood in the context of yet another project, 
DeepteXTminER (DeXTER)® (Viola and Fiscarelli 2021b) supported 
by the Luxembourg Centre for Contemporary and Digital History’s 
(C?DH—University of Luxembourg) Thinkering Grant. Composed of 
the verbs tinkering and thinking, this grant funds research applying the 
method of ‘thinkering’: ‘the tinkering with technology combined with 
the critical reflection on the practice of doing digital history’ (Fickers 
and Heijden 2020). As such, the scheme is specifically aimed at funding 
innovative projects that experiment with technological and digital tools for 
the interpretation and presentation of the past. Conceptually, the C7 DH 
itself is an international hub for reflection on the methodological and 
epistemological consequences of the Digital Turn for history;’ it serves 
as a platform for engaging critically with the various stages of historical 
research (archiving, analysis, interpretation and narrative) with a particular 
focus on the use of digital methods and tools. Physically, it strives to 
actualise interdisciplinary knowledge production and dissemination by 
fostering ‘trading zones’ (Galison and Stump 1996; Collins et al. 2007), 
working environments in which interactions and negotiations between 
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different disciplines can happen (Fickers and Heijden 2020). Within this 
institutional and conceptual framework, I conceived DeXTER as a post- 
authentic research activity to critically assess and implement different 
state-of-the-art natural language processing (NLP) and deep learning 
techniques for the curation and visualisation of digital heritage material. 
DeXTER’s ultimate goal was to bring the utilised techniques into as close 
an alignment as possible with the principle of human agency (cfr. Chap. 3). 

The larger ecosystem of the ChroniclItaly collections thus exemplifies 
the evolving nature of digital objects and how international and national 
processes interweave with wider external factors, all impacting differen- 
tially on the objects’ evolution. The existence of multiple versions of 
ChroniclItaly, for example, is in itself a reflection of the incompleteness 
of the Chronicling America project to which titles, issues and digitised 
material are continually added. ChroniclItaly and ChroniclItaly 2.0 include 
seven titles and issues from 1898 to 1920 that portray the chronicles of 
Italian immigrant communities from four states (California, Pennsylvania, 
Vermont, and West Virginia); ChroniclItaly 3.0 expands the two previous 
versions by including three additional titles published in Connecticut and 
pushing the overall time span to cover from 1898 to 1936. In terms of 
issues, ChroniclItaly 3.0 almost doubles the number of included pages 
compared to its predecessors: 8653 vs 4810 of its previous versions. This is 
a clear example of how the formation of a digital object is impacted by the 
surrounding digital infrastructure, which in turn is dependent on funding 
availability and whose very constitution is shaped by the various research 
projects and the involved actors in its making. 


2.5 THE IMPORTANCE OF BEING DIGITAL 


Understanding digital objects as post-authentic objects means to acknowl- 
edge them as part of the complex interaction of countless factors and 
dynamics and to recognise that the majority of such factors and dynamics 
are invisible and unpredictable. Due to the extreme complexity of inter- 
related forces at play, the formidable task of writing both the past in the 
present and the future past demands careful handling. This is what Braidotti 
and Fuller call ‘a meaningful response move from the relatively short chain 
of intention-to-consequence [...] to the longer chains of consequences in 
which chance becomes a more structural force’ (2019, 13). Here chance is 
understood as the unpredictable combination of all the numerous known 
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and unknown actors involved, conscious and unconscious biases, past, 
present and future experiences, and public, private and personal interests. 
With specific reference to the ChroniclItaly collections, for example, in 
addition to the already discussed multiple factors influencing their creation, 
many of which date even decades before, the nature itself of this digital 
object and of its content bears significance for our conceptualisation of 
digital heritage and more broadly, for digital knowledge creation practices. 

The collections collate immigrant press material. The immigrant press 
represents the first historical stage of the ethnic press, a phenomenon 
associated with the mass migration to the Americas between the 1880s 
and 1920s, when it is estimated that over 24 million people from all 
around the world arrived to America (Bandiera et al. 2013). Indeed, as 
immigrant communities were growing exponentially, so did the immigrant 
press: at the turn of the twentieth century, about 1300 foreign-language 
newspapers were being printed in the United States with an estimated 
circulation of 2.6 million (Bjork 1998). By giving immigrants all sorts 
of practical and social advice—from employment and housing to religious 
and cultural celebrations and from learning English to acquiring American 
citizenship—these newspapers truly helped immigrants to transition into 
American society. As immigrant newspapers quickly became an essential 
element at many stages in an immigrant’s life (Rhodes 2010, 48), the 
immigrant press is a resource of particularly valuable significance not only 
for studying the lives of many of the communities that settled in the United 
States but also for opening a comprehensive window onto the American 
society of the time (Viola and Verheul 2020a). 

As far as the Italians were concerned, it has been calculated that by 1920, 
they were representing more than 10% of the non-US-born population 
(about 4 millions) (Wills 2005). The Italian community was also among 
the most prolific newspapers’ producers; between 1900 and 1920, there 
were 98 Italian titles that managed to publish uninterruptedly, whereas at 
its publication peak, this number ranged between 150 and 264 (Deschamps 
2007, 81). In terms of circulation, in 1900, 691,353 Italian newspapers 
were sold across the United States (Park 1922, 304), but in New York 
alone, the circulation ratio of the Italian daily press is calculated as one 
paper for every 3.3 Italian New Yorkers (Vellon 2017, 10). Distribution 
and circulation figures should however be doubled or perhaps even tripled, 
as illiteracy levels were still high among this generation of Italians and 
newspapers were often read aloud (Park 1922; Vellon 2017; Viola and 
Verheul 2019a; Viola 2021). 
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These impressive figures on the whole may point to the influential role of 
the Italian language press not just for the immigrant community but within 
the wider American context, too. At a time when the mass migrations were 
causing a redefinition of social and racial categories, notions of race, civilisa- 
tion, superiority and skin colour had polarised into the binary opposition of 
white/superior vs non-white /inferior (Jacobson 1998; Vellon 2017; Viola 
and Verheul 2019a). The whiteness category, however, was rather complex 
and not at all based exclusively on skin colour. Jacobson (1998) for instance 
describes it as ‘a system of “difference” by which one might be both white 
and racially distinct from other whites’ (ibid., p. 6). Indeed, during the 
period covered by the ChroniclItaly collections, immigrants were granted 
‘white’ privileges depending not on how white their skin might have been, 
rather on how white they were perceived (Foley 1997). Immigrants in 
the United States who were experiencing this uncertain social identity 
situation have been described as ‘conditionally white’ (Brodkin 1998), 
‘situationally white’ (Roediger 2005) and ‘inbetweeners’ (among others 
Barrett and Roediger 1997; Guglielmo and Salerno 2003; Guglielmo 
2004; Orsi 2010). 

This was precisely the complicated identity and social status of Italians, 
especially of those coming from Southern Italy; because of their challeng- 
ing economic and social conditions and their darker skin, both other ethnic 
groups and Americans considered them as socially and racially inferior and 
often discriminated against them (LaGumina 1999; Luconi 2003). For 
example, Italian immigrants would often be excluded by employment and 
housing opportunities and be victims of social discrimination, exploita- 
tion, physical violence and even lynching (LaGumina 1999; Connell and 
Gardaphé 2010; Vellon 2010; LaGumina 2018; Connell and Pugliese 
2018). The social and historical importance of Italian immigrant news- 
papers is found in how they advocated the rights for the community they 
represented, crucially acting as powerful inclusion, community building 
and national identity preservation forces, as well as language and cultural 
retention tools. At the same time, because such advocate role was often 
paired with the condemnation of American discriminatory practices, these 
newspapers also performed a decisive transforming role of American society 
at large, undoubtedly contributing to the tangible shaping of the country. 
The immigrant press and the ChroniclItaly collections can therefore be 
an extremely valuable source to investigate specifically how the internal 
mechanisms of cohesion, class struggle and identity construction of the 
Italian immigrant community contributed to transform America. 
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Lastly, these collections can also bring insights into the Italian immi- 
grants’ role in the geographical shaping of the United States. The majority 
of the 4 million Italians that had arrived to the United States—mostly 
uneducated and mostly from the south—had done so as the result of 
chain migration. Naturally, they would settle closely to relatives and 
friends, creating self-contained neighbourhoods clustered according to dif- 
ferent regional and local affiliations (MacDonald and MacDonald 1964). 
Through the study of the geographical places contained in the collections 
as well as the place of publication of the newspapers’ titles, the ChroniclItaly 
collections provide an unconventional and traditionally neglected source 
for studying the transforming role of migrants for host societies. 

On the whole, however, the novel contribution of the ChroniclItaly 
collections comes from the fact that they allow us to devote attention to 
the study of historical migration as a process experienced by the migrants 
themselves (Viola 2021). This is rare as in discourse-based migration 
research, the analysis tends to focus on discourse on migrants, rather than 
by migrants (De Fina and Tseng 2017; Viola 2021). Instead, through the 
analysis of migrants’ narratives, it is possible to explore how displaced 
individuals dealt with social processes of migration and transformation 
and how these affected their inner notions of identity and belonging. 
A large-scale digital discourse-based study of migrants’ narratives creates 
a mosaic of migration, a collective memory constituted by individual 
stories. In this sense, the importance of being digital lies in the fact that 
this information can be processed on a large-scale and across different 
migrants’ communities. The digital therefore also offers the possibility— 
perhaps unimaginable before—of a kaleidoscopic view that simultaneously 
apprehends historical migration discourse as a combination of inner and 
outer voices across time and space. Furthermore, as records are regularly 
updated, observations can be continually enriched, adjusted, expanded, 
recalibrated, generalised or contested. At the same time, mapping these 
narratives creates a shimmering network of relations between the past 
migratory experiences of diasporic communities and contemporary migra- 
tion processes experienced by ethnic groups, which can also be compared 
and analysed both as active participants and spectators. 

Abby Smith Rumsey said that the true value of the past is that it is 
the raw material we use to create the future (Rumsey 2016). It is only 
through gaining awareness of these spatial temporal correspondences that 
the past can become part of our collective memory and, by preventing us 
from forgetting it, of our collective future. Understanding digital objects 
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through the post-authentic lens entails that great emphasis must be given 
on the processes that generate the mappings of the correspondences. The 
post-authentic framework recognises that these processes cannot be neutral 
as they stem from systems of interpretation and management which are 
situated and therefore partial. These processes are never complete nor they 
can be completed and as such they require constant update and critical 
supervision. 

In the next chapter, I will illustrate the second use case of this book— 
data augmentation; the case study demonstrates that the task of enriching 
a digital object is a complex managerial activity, made up of countless 
critical decisions, interactions and interventions, each one having conse- 
quences. The application of the post-authentic framework for enriching 
ChroniclItaly 3.0 demonstrates how symbiosis and mutualism can guide 
how the interaction with the digital unfolds in the process of knowledge 
creation. I will specifically focus on why computational techniques such 
as optical character recognition (OCR), named entity recognition (NER), 
geolocation and sentiment analysis (SA) are problematic and I will show 
how the post-authentic framework can help address the ambiguities and 
uncertainties of these methods when building a source of knowledge for 
current and future generations. 


NOTES 


1. For a review of the discussion, see, for example, Cameron (2007, 2021). 
2. https://www.marktechpost.com/2019/01/30 /will- machine-learning- 
enable-time-travel/. 

3. These may sometimes be referred to as ‘wicked problems’; see, for instance, 
Churchman (1967), Brown et al. (2010), and Ritchey (2011). 

. https://chroniclingamerica.loc.gov/. 

. https://utrecht-university.shinyapps.io/GeoNewsMiner/. 

. https://github.com/lorellav/DeXTER-DeepTextMiner. 

. https://www.c2dh.unilu/about. 
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CHAPTER 3 


The Opposite of Unsupervised 


When you control someone’s understanding of the past, you control their 
sense of who they are and also their sense of what they can imagine becoming. 
(Abby Smith Rumsey, 2016) 


3.1 ENRICHMENT OF DIGITAL OBJECTS 


After the initial headlong rush to digitisation, libraries, museums and other 
cultural heritage institutions realised that simply making sources digitally 
available did not ensure their use; what in fact became apparent was that 
as the body of digital material grew, users’ engagement decreased. This 
was rather disappointing but more importantly, it was worrisome. Millions 
had been poured into large-scale digitisation projects, pitched to funding 
agencies as the ultimate Holy Grail of cultural heritage (cfr. Chap. 2), a 
safe, more efficient way to protect and preserve humanity’s artefacts and 
develop new forms of knowledge, simply unimaginable in the pre-digital 
era. Although some of it was true, what had not been anticipated was the 
increasing difficulty experienced by users in retrieving meaningful content, 
a difficulty that corresponded to the rate of digital expansion. Especially 
when paired with poor interface design, frustrated users were left with an 
overall unpleasant experience, feeling overwhelmed and dissatisfied. 

Thus, to earn the return on investment in digitisation, institutions 
urgently needed novel approaches to maximise the potential of their digital 
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collections. It soon became obvious that the solution was to simplify and 
improve the process of exploring digital archives, to make information 
retrievable in more valuable ways and the user experience more meaningful 
on the whole. Naturally, within the wider incorporation of technology in 
all sectors, it is not at all surprising that ML and AI have been more than 
welcomed to the digital cultural heritage table. Indeed, AI is particularly 
appreciated for its capacity to automate lengthy and boring processes that 
nevertheless enhance exploration and retrieval for conducting more in- 
depth analyses, such as the task of annotating large quantities of digital 
textual material with referential information. Indeed, as this technology 
continues to develop together with new tools and methods, it is more and 
more used to help institutions fulfil the main purposes of heritagisation: 
knowledge preservation and access. 

One widespread way to enhance access is through ‘content enrichment’ 
or just enrichment for short. It consists of a wide range of techniques 
implemented to achieve several goals from improving the accuracy of 
metadata for better content classification? to annotating textual content 
with contextual information, the latter typically used for tasks such as 
discovering layers of information obscured by data abundance (see, for 
instance, Taylor et al. 2018; Viola and Verheul 2020a). There are at least 
four main types of text annotation: entity annotation (e.g., named entity 
recognition—NER), entity linking (e.g., entity disambiguation), text clas- 
sification and linguistic annotation (e.g., parts-of-speech tagging—POS). 
Content enrichment is also often used by digital heritage providers to 
link collections together or to populate ontologies that aim to standardise 
procedures for digital sources preservation, help retrieval and exchange 
(among others Albers et al. 2020; Fiorucci et al. 2020). 

The theoretical relevance of performing content enrichment, especially 
for digital heritage collections, lies precisely in its great potential for discov- 
ering the cultural significance underneath referential units, for example, by 
cross-referencing them with other types of data (e.g., historical, social, tem- 
poral). We enriched ChroniclItaly 3.0 for NER, geocoding and sentiment 
within the context of the DeXTER project. Informed by the post-authentic 
framework, DeXTER combines the creation of an enrichment workflow 
with a meta-reflection on the workflow itself. Through this symbiotic 
approach, our intention was to prompt a fundamental rethink of both the 
way digital objects and digital knowledge creation are understood and the 
practices of digital heritage curation in particular. 
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It is all too often assumed that enrichment, or at least parts of it, 
can be fully automated, unsupervised and even launched as a one-step 
pipeline. Preparing the material to be ready for computational analysis, 
for example, often ambiguously referred to as ‘cleaning’, is typically 
presented as something not worthy of particular critical scrutiny. We 
are misleadingly told that operations such as tokenisation, lowercasing, 
stemming, lemmatisation and removing stopwords, numbers, punctuation 
marks or special characters don’t need to be problematised as they are 
rather tedious, ‘standard’ operations. My intention here is to show how 
it is on the contrary paramount that any intervention on the material 
is tackled critically. When preparing the material for further processing, 
full awareness of the curator’s influential role is required as each one 
of the taken actions triggers different chain reactions and will therefore 
output different versions of the material. To implement one operation 
over another influences how the algorithms will process such material and 
ultimately, how the collection will be enriched, the information accessed, 
retrieved and finally interpreted and passed on to future generations (Viola 
and Fiscarelli 2021b, 54). 

Broadly, the argument I present provokes a discussion and critique of the 
fetishisation of empiricism and technical objectivity not just in humanities 
research but in knowledge creation more widely. It is this critical and hum- 
ble awareness that reduces the risks of over-trusting the pseudo-neutrality 
of processes, infrastructures, software, categories, databases, models and 
algorithms. The creation and enrichment of ChroniclItaly 3.0 show how 
the conjuncture of the implicated structural forces and factors cannot be 
envisioned as a network of linear relations and as such, cannot be predicted. 
The acknowledgement of the limitations and biases of specific tools and 
choices adopted in the curation of ChroniclItaly 3.0 takes the form of a 
thorough documentation of the steps and actions undertaken during the 
process of creation of the digital object. In this way, it is not just the product, 
however incomplete, that is seen as worthy of preservation for current and 
future generations, but also equally the process (or indeed processes) for 
creating it. Products and processes are unfixed and subject to change, they 
transcend questions of authenticity; they allow room for multiple versions, 
all equally post-authentic, in that they may reflect different curators and 
materials, different programmers, rapid technological advances, changing 
temporal frameworks and values. 
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3.2 PREPARING THE MATERIAL 


How to critically assess which ones of the preparatory operations for 
enrichment one should perform depends on internal factors such as the 
language of the collection; the type of material; the specific enrichment 
tasks to follow as well as external factors such as the available means and 
resources, both technical and financial; the time-frame, the intended users 
and research aims; the infrastructure that will store the enriched collection; 
and so forth. Indeed, far from being ‘standard’, each intervention needs to 
be specifically tailored to individual cases. Moreover, since each operation 
is factually an additional layer of manipulation, it is fundamental that 
scholars, heritage operators and institutions assess carefully to what degree 
they want to intervene on the material and how, and that their decisions 
are duly documented and motivated. In the case of ChroniclItaly 3.0, 
for example, the documentation of the specific preparatory interventions 
taken towards enriching the collection, namely, tokenisation, removing 
numbers and dates and removing words with less than two characters and 
special characters, is embedded as an integral part of the actual workflow. 
I wanted to signal the need for refiguring digital knowledge creation 
practices as honest and fluid exchanges between the computational and 
human agency, counterbalancing the narrative that depicts computational 
techniques as autonomous processes from which the human is (should be?) 
removed. Thus, as a thoughtful post-authentic project, I have considered 
each action as part of a complex web of interactions between the multiple 
factors and dynamics at play with the awareness that the majority of 
such factors and dynamics are invisible and unpredictable. Significantly, 
the documentation of the steps, tools and decisions serves the valuable 
function of acknowledging such awareness for contemporary and future 
generations. 

This process can be envisioned as a continuous dialogue between 
human and artificial intelligence and it can be illustrated by describing 
how we handled stopwords (e.g., prepositions, articles, conjunctions) 
and punctuation marks when preparing ChroniclItaly 3.0 for enrichment. 
Typically, stopwords are reputed to be semantically non-salient and even 
potentially disruptive to the algorithms’ performance; as such, they are 
normally removed automatically. However, as they are of course language- 
bound, removing these items indiscriminately can hinder future analyses 
having more destructive consequences than keeping them. Thus, when 
enriching ChroniclItaly 3.0, we considered two fundamental factors: the 
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language of the data-set—Italian—and the enrichment actions to follow, 
namely, NER, geocoding and SA. For example, we considered that in 
Italian, prepositions are often part of locations (e.g., America del Nord— 
North America), organisations (e.g., Camera del Senato—the Senate) 
and people’s names (e.g., Gabriele d’Annunzio),; removing them could 
have negatively interfered with how the NER model had been trained to 
recognise referential entities. Similarly, in preparation for performing SA 
at sentence level (cfr. Sect. 3.4), we did not remove punctuation marks; in 
Italian punctuation marks are typical sentence delimiters; therefore, they 
are indispensable for the identification of sentences’ boundaries. 

Another operation that we critically assessed concerns the decision to 
whether to lowercase the material before performing NER and geocoding. 
Lowercasing text before performing other actions can be a double-edged 
sword. For example, if lowercasing is not implemented, a NER algorithm 
will likely process tokens such as ‘USA’, ‘Usa’, ‘usa’, ‘UsA’ and ‘uSA’ as 
distinct items, even though they may all refer to the same entity. This may 
turn out to be problematic as it could provide a distorted representation 
of that particular entity and how it is connected to other elements in the 
collection. On the other hand, if the material is lowercased, it may become 
difficult for the algorithm to identify ‘usa’ as an entity at all,” which may 
result in a high number of false negatives, thus equally skewing the output. 
We, once again, intervened as human agents: we considered that entities 
such as persons, locations and organisations are typically capitalised in 
Italian and therefore, in preparation for NER and geocoding, lowercasing 
was not performed. However, once these steps were completed, we did 
lowercase the entities and following a manual check, we merged multiple 
items referring to the same entity. This method allowed us to obtain a 
more realistic count of the number of entities identified by the algorithm 
and resulted in a significant redistribution of the entities across the different 
titles, as I will discuss in Sect. 3.3. Albeit more accurate, this approach did 
not come without problems and repercussions; many false negatives are 
still present and therefore the tagged entities are NOT all the entities in 
the collections. I will return to this point in Chap. 5. 

The decision we took to remove numbers, dates and special characters 
is also a good example of the importance of being deeply engaged 
with the specificity of the source and how that specificity changes the 
application of the technology through which that engagement occurs. Like 
the large majority of the newspapers collected in Chronicling America, the 
pages forming ChroniclItaly 3.0 were digitised primarily from microfilm 
holdings; the collection therefore presents the same issues common to 
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OCR-generated searchable texts (as opposed to born digital texts) such 
as errors derived from low readability of unusual fonts or very small char- 
acters. However, in the case of ChroniclItaly 3.0, additional factors must 
be considered when dealing with OCR errors. The newspapers aggregated 
in the collection were likely digitised by different NDNP awardees, who 
probably employed different OCR engines and/or chose different OCR 
settings, thus ultimately producing different errors which in turn affected 
the collection’s accessibility in an unsystematic way. Like all ML predictions 
models, OCR engines embed the various biases encoded not only in the 
OCR engine’s architecture but more importantly, in the data-sets used for 
training the model (Lee 2020). These data-sets typically consist of sets of 
transcribed typewritten pages which embed the human subjectivity (e.g., 
spelling errors) as well as individual decisions (e.g., spelling variations). 
All these factors have wider, unpredictable consequences. As previously 
discussed in reference to microfilming (cfr. Sect. 2.2), OCR technology has 
raised concerns regarding marginalisation, particularly with reference to 
the technology’s consequences for content discoverability (Noble 2018; 
Reidsma 2019). These scholars have argued that this issue is closely related 
to the fact that the most largely implemented OCR engines are both 
licensed and opaquely documented; they therefore not only reflect the 
strategic, commercial choices made by their creators according to specific 
corporate logics but they are also practically impossible to audit. Despite 
being promoted as ‘objective’ and ‘neutral’, these systems incorporate 
prejudices and biases, strong commercial interests, third-party contracts 
and layers of bureaucratic administration. Nevertheless, this technology is 
implemented on a large scale and it therefore deeply impacts what—on a 
large scale—is found and lost, what is considered relevant and irrelevant, 
what is preserved and passed on to future generations and what will not 
be, what is researched and studied and what will not be accessed. 
Understanding digital objects as post-authentic entails being mindful 
of all the alterations and transformations occurring prior to accessing the 
digital record and how each one of them is connected to wider networks 
of systems, factors and complexities, most of which are invisible and 
unpredictable. Similarly, any following intervention adds further layers 
of manipulation and transformation which incorporate the previous ones 
and which will in turn have future, unpredictable consequences. For 
example, in Sect. 2.2 I discussed how previous decisions about what was 
worth digitising dictated which languages needed to be prioritised, in turn 
determining which training data-sets were compiled for different language 
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models, leading to the current strong bias towards English models, data- 
sets and tools and an overall digital language and cultural injustice. 
Although the non-English content in Chronicling America has been 
reviewed by language experts, many additional OCR errors may have 
originated from markings on the material pages or a general poor condition 
of the physical object. Again, the specificity of the source adds further 
complexity to the many problematic factors involved in its digitisation; in 
the case of ChroniclItaly 3.0, for example, we found that OCR errors were 
often rendered as numbers and special characters. To alleviate this issue, 
we decided to remove such items from the collection. This step impacted 
differently on the material, not just across titles but even across issues of the 
same title. Figure 3.1 shows, for example, the impact of this operation on 
Cronaca Sovversiva, one of the newspapers collected in ChroniclItaly 3.0 
with the longest publication record, spanning almost throughout the entire 
archive, 1903-1919. On the whole, this intervention reduced the total 
number of tokens from 30,752,942 to 21,454,455, equal to about 30% of 
overall material removed (Fig. 3.2). Although with sometimes substantial 
variation, we found the overall OCR quality to be generally better in the 
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Fig. 3.2 Impact of pre-processing operations on ChroniclItaly 3.0 per title. 
Figure taken from Viola and Fiscarelli (2021b) 


most recent texts. This characteristic is shared by most OCRed nineteenth- 
century newspapers, and it has been ascribed to a better conservation status 
or better initial condition of the originals which overall improved over time 
(Beals and Bell 2020). Figure 3.3 shows the variation of removed material 
in D'Italia, the largest newspaper in the collection comprising 6489 issues 
published uninterruptedly from 1897 to 1919. 

Finally, my experience of previously working on the GeoNewsMiner 
(Viola et al. 2019) (GNM) project also influenced the decisions we took 
when enriching ChroniclItaly 3.0. As said in Sect.2.4, GNM loads Chron- 
iclItaly 2.0, the version of the ChroniclItaly collections annotated with 
referential entities without having performed any of the pre-processing 
tasks described here in reference to ChroniclItaly 3.0. A post-tagging 
manual check revealed that, even though the Fl score of the NER 
model—that is the measure to test a model’s accuracy—was 82.88, due 
to OCR errors, the locations occurring less than eight times were in fact 
false positives (Viola et al. 2019; Viola and Verheul 2020a). Hence, the 
interventions we made on ChroniclItaly3.0 aimed at reducing the OCR 
errors to increase the discoverability of elements that were not identified 
in the GNM project. When researchers are not involved in the creation 
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Fig. 3.3 Variation of removed material (in percentage) across issues/years of 
L Italia 


of the applied algorithms or in choosing the data-sets for training them— 
which especially in the humanities represents the majority of cases—and 
consequently when tools, models and methods are simply reused as part of 
the available resources, the post-authentic framework can provide a critical 
methodological approach to address the many challenges involved in the 
process of digital knowledge creation. 

The illustrated examples demonstrate the complex interactions between 
the materiality of the source and the digital object, between the enrichment 
operations and the concurrent curator’s context, and even among the 
enrichment operations themselves. The post-authentic framework high- 
lights the artificiality of any notion conceptualising digital objects as 
copies, unproblematised and disconnected from the material object. Indeed, 
understanding digital objects as post-authentic means acknowledging the 
continuous flow of interactions between the multiple factors at play, only 
some of which I have discussed here. Particularly in the context of digital 
cultural heritage, it means acknowledging the curators’ awareness that the 
past is written in the present and so it functions as a warning against 
ignoring the collective memory dimension of what is created, that is the 
importance of being digital. 


66 L.VIOLA 


3.3 NER AND GEOLOCATION 


In addition to the typical motivations for annotating a collection with 
referential entities such as sorting unstructured data and retrieving poten- 
tially important information, my decision to annotate ChroniclItaly 3.0 
using NER, geocoding and SA was also closely related to the nature of 
the collection itself, i.e., the specificity of the source. One of the richest 
values of engaging with records of migrants’ narratives is the possibility to 
study how questions of cultural identities and nationhood are connected 
with different aspects of social cohesion in transnational, multicultural 
and multilingual contexts, particularly as a social consequence of migra- 
tion. Produced by the migrants themselves and published in their native 
language, ethnic newspapers such as those collected in ChroniclItaly 3.0 
function in a complex context of displacement, and as such, they offer deep, 
subjective insights into the experience and agency of human migration 
(Harris 1976; Wilding 2007; Bakewell and Binaisa 2016; Boccagni and 
Schrooten 2018). 

Ethnic newspapers, for instance, provide extensive material for inves- 
tigating the socio-cognitive dimension of migration through markers of 
identity. Markers of identity can be cultural, social or biological such as 
artefacts, family or clan names, marriage traditions and food practices, 
to name but a few (Story and Walker 2016). Through shared claims of 
ethnic identity, these markers are essential to communities for maintaining 
internal cohesion and negotiating social inclusion (Viola and Verheul 
2019a). But in diasporic contexts, markers of identity can also reveal the 
changing subtle renegotiations of migrants’ cultural affiliation in mediating 
interests of the homeland with the host environment. Especially when 
connected with entities such as places, people and organisations, these 
markers can be part of collective narratives of pride, nostalgia or loss, and 
their analysis may therefore bring insights into how cultural markers of 
identity and ethnicity are formed and negotiated and how displaced indi- 
viduals make sense of their migratory experience. The ever-larger amount 
of available digital sources, however, has created a complexity that cannot 
easily be navigated, certainly not through close reading methods alone. 
Computational methods such as NER methodologies, though presenting 
limitations and challenges, can help identify names of people, places, brands 
and organisations thus providing a way to identify markers of identity on a 
large scale. 
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We annotated ChroniclItaly 3.0 by using a NER deep learning sequence 
tagging tool (Riedl and Padé 2018) which identified 547,667 entities 
occurring 1,296,318 times across the ten titles.* A close analysis of the 
output, however, revealed a number of issues which required a critical 
intervention combining expert knowledge and technical ability. In some 
cases, for example, entities had been assigned the wrong tag (e.g., ‘New 
York’ tagged as a person), other times elements referring to the same 
entity had been tagged as different entities (e.g., ‘Woodrow Wilson’, 
‘President Woodrow Wilson’), and in some other cases elements identified 
as entities were not entities at all (e.g., venerdi ‘Friday’ tagged as an 
organisation). To avoid the risk of introducing new errors, we intervened 
on the collection manually; we performed this task by first conducting a 
thorough historical triangulation of the entities and then by compiling a 
list of the most frequent historical entities that had been attributed the 
wrong tag. Although it was not possible to ‘repair.’ All the tags, this 
post-tagging intervention affected the redistribution of 25,713 entities 
across all the categories and titles, significantly improving the accuracy 
of the tags that would serve as the basis for the subsequent enrichment 
operations (i.e., geocoding and SA). Figure 3.4 shows how in some cases 
the redistribution caused a substantial variation: for example, the number 
of entities in the LOC (location) category significantly decreased in La 
Rassegna but it increased in L’Italia. The documentation of these processes 
of transformation is available Open Access* and acts as a way to acknowl- 
edge them as problematic, as undergoing several layers of manipulation 
and interventions, including the multidirectional relationships between the 
specificity of the source, the digitised material and all the surrounding 
factors at play. Ultimately, the post-authentic framework to digital objects 
frames digital knowledge creation as honest and accountable, unfinished 
and receptive to alternatives. 

Once entities in ChroniclItaly 3.0 were identified, annotated and ver- 
ified, we decided to geocode places and locations and to subsequently 
visualise their distribution on a map. Especially in the case of large 
collections with hundreds of thousands of such entities, their visualisation 
may greatly facilitate the discovery of deeper layers of meaning that may 
otherwise be largely or totally obscured by the abundance of material 
available. I will discuss the challenges of visualising digital objects in 
Chap. 5 and illustrate how the post-authentic framework can guide both 
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Fig. 3.4 Distribution of entities per title after intervention. Positive bars indicate 
a decreased number of entities after the process, whilst negative bars indicate an 
increased number. Figure taken from Viola and Fiscarelli (2021b) 


the development of a UI and the encoding of criticism into graphical 
display approaches. 

Performing geocoding as an enrichment intervention is another exam- 
ple of how the process of digital knowledge creation is inextricably entan- 
gled with external dynamics and processes, dominant power structures and 
past and current systems in an intricate net of complexities. In the case 
of ChroniclItaly 3.0, for instance, the process of enriching the collection 
with geocoding information shares much of the same challenges as with 
any material whose language is not English. Indeed, the relative scarcity of 
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certain computational resources available for languages other than English 
as already discussed often dictates which tasks can be performed, with 
which tools and through which platforms. Practitioners and scholars as 
well as curators of digital sources often have to choose between either 
creating resources ad hoc, e.g., developing new algorithms, fine-tuning 
existing ones, training their own models according to their specific needs 
or more simply using the resources available to them. Either option may 
not be ideal or even at all possible, however. For example, due to time 
or resources limitations or to lack of specific expertise, the first approach 
may not be economically or technically feasible. On the other hand, even 
when models and tools in the language of the collection do exist—like in 
the case of ChroniclItaly 3.0—typically their creation would have occurred 
within the context of another project and for other purposes, possibly 
using training data-sets with very different characteristics from the material 
one is enriching. This often means that the curator of the enrichment 
process must inevitably make compromises with the methodological ideal. 
For example, in the case of ChroniclItaly 3.0, in the interest of time, we 
annotated the collection using an already existing Italian NER model. The 
manual annotation of parts of the collection to train an ad hoc model 
would have certainly yielded much more accurate results but it would have 
been a costly, lengthy and labour-intensive operation. On the other hand, 
while being able to use an already existing model was certainly helpful 
and provided an acceptable F1 score, it also resulted in a poor individual 
performance for the detection of the entity LOC (locations) (54.19%) 
(Viola and Fiscarelli 2021a). This may have been due to several factors 
such as a lack of LOC-category entities in the data-set used for originally 
training the NER model or a difference between the types of LOC entities 
in the training data-set and the ones in ChroniclItaly 3.0. Regardless of the 
reason, due to the low score, we decided to not geocode (and therefore 
visualise) the entities tagged as LOC; they can however still be explored, 
for example, as part of SA or in the GitHub documentation available Open 
Access. Though not optimal, this decision was motivated also by the fact 
that geopolitical entities (GPE) are generally more informative than LOC 
entities as they typically refer to countries and cities (though sometimes 
the algorithm retrieved also counties and States), whereas LOC entities 
are typically rivers, lakes and geographical areas (e.g., the Pacific Ocean). 
However, users should be aware that the entities currently geocoded are 
by no means all the places and locations mentioned in the collection; 
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future work may also focus on performing NER using a more fine-tuned 
algorithm so that the LOC-type entities could also be geocoded. 


3.4 SENTIMENT ANALYSIS 


Annotating textual material for attitudes—either sentiment or opinions— 
through a method called sentiment analysis (SA) is another enriching 
technique that can add value to digital material. This method aims to 
identify the prevailing emotional attitude in a given text, though it often 
remains unclear whether the method detects the attitude of the writer or 
the expressed polarity in the analysed textual fragment (Puschmann and 
Powell 2018). Within DeXTER, we used SA to identify the prevailing 
emotional attitude towards referential entities in ChroniclItaly 3.0. Our 
intention was twofold: firstly, to obtain a more targeted enrichment 
experience than it would have been possible by applying SA to the entire 
collection and, secondly, to study referential entities as markers of identity 
so as to access the layers of meaning migrants attached historically to 
people, organisations and geographical spaces. Through the analysis of 
the meaning humans invested in such entities, our goal was to delve into 
how their collective emotional narratives may have changed over time 
(Tally 2011; Donaldson et al. 2017; Taylor et al. 2018; Viola and Verheul 
2020a). Because of the specific nature of ChroniclItaly 3.0, this exploration 
inevitably intersects with understanding how questions of cultural identities 
and nationhood were connected with different aspects of social cohesion 
(e.g., transnationalism, multiculturalism, multilingualism), how processes 
of social inclusion unfolded in the context of the Italian American diaspora, 
how Italian migrants managed competing feelings of belonging and how 
these may have changed over time. 

SA is undoubtedly a powerful tool that can facilitate the retrieval of 
valuable information when exploring large quantities of textual material. 
Understanding SA within the post-authentic framework, however, means 
recognising that specific assumptions about what constitutes valuable 
information, what is understood by sentiment and how it is understood and 
assessed guided the devise of the technique. All these assumptions are invis- 
ible to the user; the post-authentic framework warns the analyst to be wary 
of the indiscriminate use of the technique. Indeed, like other techniques 
used to augment digital objects including digital heritage material, SA 
did not originate within the humanities; SA is a computational linguistics 
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method developed within natural language processing (NLP) studies as 
a subfield of information retrieval (IR). In the context of visualisation 
methods, Johanna Drucker has long discussed the dangers of a blind 
and unproblematised application of approaches brought in the humanities 
from other disciplines, including computer science. Particularly about the 
specific assumptions at the foundation of these techniques, she points out, 
‘These assumptions are cloaked in a rhetoric taken wholesale from the 
techniques of the empirical sciences that conceals their epistemological 
biases under a guise of familiarity’ (Drucker 2011, 1). In Chap. 4, I will 
discuss the implications of a very closely related issue, the metaphorical 
use of everyday lexicon such as ‘sentiment analysis’, ‘topic modelling’ and 
‘machine learning’ as a way to create familiar images whilst however refer- 
ring to rather different concepts from what is generally internalised in the 
collective image. In the case of SA, for example, the use of the familiar word 
‘sentiment’ conceals the fact that this technique was specifically designed 
to infer general opinions from product reviews and that, accordingly, it 
was not conceived for empirical social research but first and foremost as an 
economic instrument. 

The application of SA in domains different from its original conception 
poses several challenges which are well known to computational linguists— 
the techniques’ creators—but perhaps less known to others; whilst opinions 
about products and services are not typically problematic as this is precisely 
the task for which SA was developed, due to their much higher linguistic 
and cultural complexity, opinions about social and political issues are much 
harder to tackle. This is due to the fact that SA algorithms lack sufficient 
background knowledge of the local social and political contexts, not to 
mention the challenges of detecting and interpreting sarcasm, puns, plays 
on words and ironies (Liu 2020). Thus, although most SA techniques will 
score opinions about products and services fairly accurately, they will likely 
perform poorly when based on opinionated social and political texts. This 
limitation therefore makes the use of SA problematic when other disciplines 
such as the humanities and the social sciences borrow it uncritically, worse 
yet it raises disturbing questions when the technique is embedded in a range 
of algorithmic decision-making systems based, for instance, on content 
mined from social media. For example, since its explosion in the early 
2000s, SA has been heavily used in domains of society that transcend the 
method’s original conception: it is constantly applied to make stock market 
predictions and in the health sector and by government agencies to analyse 
citizens’ attitudes or concerns (Liu 2020). 
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In this already overcrowded landscape of interdependent factors, there is 
another element that adds yet more complexity to the matter. As with other 
computational techniques, the discourse around SA depicts the method as 
detached from any subjectivity, as a technique that provides a neutral and 
observable description of reality. In their analysis of the cultural perception 
of SA in research and the news media, Puschmann and Powell (2018) 
highlight for example how the public perception of SA is misaligned with its 
original function and how such misalignment ‘may create epistemological 
expectations that the method cannot fulfill due to its technical properties 
and narrow (and well-defined) original application to product reviews’ (2). 
Indeed, we are told that SA is a quantitative method that provides us with 
a picture of opinionated trends in large amounts of material otherwise 
impossible to map. In reality, the reduction of something as idiosyncratic 
as the definition of human emotions to two/three categories is highly 
problematic as it hides the whole set of assumptions behind the very 
establishment of such categories. For example, it remains unclear what is 
meant by neutral, positive or negative as these labels are typically presented 
as a given, as if these were unambiguous categories universally accepted 
(Puschmann and Powell 2018). On the contrary, to put it in Drucker’s 
words ‘the basic categories of supposedly quantitative information [...] are 
already interpreted expressions’ (Drucker 2011, 4). 

Through the lens of the post-authentic framework, the application of 
SA is acknowledged as problematic and so is the intrinsic nature of the 
technique itself. A SA task is usually modelled as a classification problem, 
that is, a classifier processes pre-defined elements in a text (e.g., sentences), 
and it returns a category (e.g., positive, negative or neutral). Although 
there are so-called fine-grained classifiers which attempt to provide a more 
nuanced distinction of the identified sentiment (e.g., very positive, positive, 
neutral, negative, very negative) and some others even return a prediction 
of the specific corresponding sentiment (e.g., anger, happiness, sadness), 
in the post-authentic framework, it is recognised that it is the fundamental 
notion of sentiment as discrete, stable, fixed and objective that is highly 
problematic. In Chap. 4, I will return to this concept of discrete modelling 
of information with specific reference to ambiguous material, such as 
cultural heritage texts; for now, I will discuss the issues concerning the 
discretisation of linguistic categories, a well-known linguistic problem. 

In his classic book Foundations of Cognitive Grammar, Ronald Lan- 
gacker (1983) famously pointed out how it is simply not possible to 
unequivocally define linguistic categories; this is because language does not 
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exist in a vacuum and all human exchanges are always context-bound, view- 
pointed and processual (see Langacker 1983; Talmy 2000; Croft and Cruse 
2004; Dancygier and Sweetser 2012; Gärdenfors 2014; Paradis 2015). In 
fields such as corpus linguistics, for example, which heavily rely on manually 
annotated language material, disagreement between human annotators on 
same annotation decisions is in fact expected and taken into account when 
drawing linguistic conclusions. This factor is known as ‘inter-annotator 
agreement’ and it is rendered as a measure that calculates the agreement 
between the annotators’ decisions about a label. The inter-annotator 
agreement measure is typically a percentage and depends on many factors 
(e.g., number of annotators, number of categories, type of text); it can 
therefore vary greatly, but generally speaking, it is never expected to be 
100%. Indeed, in the case of linguistic elements whose annotation is highly 
subjective because it is inseparable from the annotators’ culture, personal 
experiences, values and beliefs—such as the perception of sentiment—this 
percentage has been found to remain at 60-65% at best (Bobicev and 
Sokolova 2018). 

The post-authentic framework to digital knowledge creation introduces 
a counter-narrative in the main discourse that oversimplifies automated 
algorithmic methods such as SA as objective and unproblematic and 
encourages a more honest conversation across fields and in society. It 
acknowledges and openly addresses the interrelations between the chosen 
technique and its deep entrenchment in the system that generated it. In the 
case of SA, it advocates more honesty and transparency when describing 
how the sentiment categories have been identified, how the classification 
has been conducted, what the scores actually mean, how the results have 
been aggregated and so on. At the very least, an acknowledgement of such 
complexities should be present when using these techniques. For example, 
rather than describing the results as finite, unquestionable, objective and 
certain, a post-authentic use of SA incorporates full disclosure of the 
complexities and ambiguities of the processes involved. This would con- 
tribute to ensuring accountability when these analytical systems are used in 
domains outside of their original conception, when they are implemented 
to base centralised decisions that affect citizens and society at large or when 
they are used to interpret the past or write the future past. 

The decision of how to define the scope (see for instance Miner 2012) 
prior to applying SA is a good example of how the post-authentic frame- 
work can inform the implementation of these techniques for knowledge 
creation in the digital. The definition of the scope includes defining 
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problematic concepts of what constitutes a text, a paragraph or a sentence, 
and how each one of these definitions impacts on the returned output, 
which in turn impacts on the digitally mediated presentation of knowledge. 
In other words, in addition to the already noted caveats of applying SA 
particularly for social empirical research, the post-authentic framework 
recognises the full range of complexities derived from preparing the 
material, a process—as I have discussed in Sect. 3.2—made up of countless 
decisions and judgement calls. The post-authentic framework acknowl- 
edges these decisions as always situated, deeply entrenched in internal and 
external dynamics of interpretation and management which are themselves 
constructed and biased. For example, when preparing ChroniclItaly 3.0 
for SA, we decided that the scope was ‘a sentence’ which we defined as the 
portion of text: (1) delimited by punctuation (i.e., full stop, semicolon, 
colon, exclamation mark, question mark) and (2) containing only the most 
frequent entities. If, on the one hand, this approach considerably reduced 
processing time and costs, on the other hand, it may have caused less 
mentioned entities to be underrepresented. To at least partially overcome 
this limitation, we used the logarithmic function 2*log2° to obtain a more 
homogeneous distribution of entities across the different titles, as shown 
in Fig. 3.5. 

As for the implementation of SA itself, due to the lack of suitable 
SA models for Italian when DeXTER was carried out, we used the 
Google Cloud Natural Language Sentiment Analysis? API (Application 
Programming Interface) within the Goggle Cloud Platform Console,’ a 
console of technologies which also includes NLP applications in a wide 
range of languages. The SA API returned two values: sentiment score and 
sentiment magnitude. According to the available documentation provided 
by Google,® the sentiment score—which ranges from —1 to 1—indicates 
the overall emotion polarity of the processed text (e.g., positive, negative, 
neutral), whereas the sentiment magnitude indicates how much emotional 
content is present within the document; the latter value is often propor- 
tional to the length of the analysed text. The sentiment magnitude ranges 
from 0 to 1, whereby 0 indicates what Google defines as ‘low-emotion 
content’ and 1 indicates ‘high-emotion content’, regardless of whether the 
emotion is identified as positive or negative. The magnitude value is meant 
to help differentiate between low-emotion and mixed-emotion cases, as 
they would both be scored as neutral by the algorithm. As such, it alleviates 
the issue of reducing something as vague and subjective as the perception 
of emotions to three rigid and unproblematised categories. However, the 
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Fig. 3.5 Logarithmic distribution of selected entities for SA across titles. Figure 
taken from Viola and Fiscarelli (2021b) 


post-authentic framework recognises that any conclusion based on results 
derived from SA should acknowledge a degree of inconsistency between 
the way the categories of positive, negative and neutral emotion have 
been defined in the training model and the writer’s intention in the actual 
material to which the model is applied. Specifically, the Google Cloud 
Natural Language Sentiment Analysis algorithm differentiates between 
positive and negative emotion in a document, but it does not specify what is 
meant by positive or negative. For example, ifin the model sentiments such 
as ‘angry’ and ‘sad’ are both categorised as negative emotions regardless of 
their context, the algorithm will identify either text as negative, not ‘sad’ 
or ‘angry’, thus creating further ambiguity to the already problematic and 
non-transparent way in which ‘sad’ and ‘angry’ were originally defined 
and categorised. To marginally deal with this issue, we established a 
threshold within the sentiment range for defining ‘clearly positive’ (i.e., 
>0.3) and ‘clearly negative’ cases (i.e., <—0.3). The downside of this 
approach was however that the algorithm considered all the cases between 
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these two values as neutral/mixed-emotion cases which inevitably led to 
a flattening of nuances. In Chap. 5, I will return to the ambiguities of SA 
when discussing the design choices for developing the DeXTER app, the 
interactive visualisation tool to explore ChroniclItaly 3.0, and I will present 
suggestions towards visualising the complexities and uncertainties in data- 
models and visualisation techniques. 

The application of the post-authentic framework to SA highlights that 
the technique is far from being methodologically ideal and it calls attention 
to all the uncertainties of using it in fields other than IR and for tasks other 
than product review, as the use case discussed here. The post-authentic 
framework acts therefore as a warning against these shortcomings and 
creates a space for accountability for the adopted curatorial decisions. 
Within DeXTER and ChroniclItaly 3.0, we thoroughly documented such 
decisions which can be accessed through the openly available dedicated 
GitHub repository? which also includes the code, links to the original and 
processed material, and the files documenting the manual interventions. 
Ultimately, the post-authentic framework counterbalances the main public 
discourse—separate from computational research—which promotes SA as 
an exact way to measure emotions and opinions, it recognises when its use 
is disconnected from its original purpose, and it accordingly advocates the 
reworking of the user’s epistemological expectations. 

In this respect, the implementation of the post-authentic framework 
for knowledge creation in the digital relates to one of the central pillars 
of science, that of replicability (or reproducibility /repeatability).!° The 
principle postulates that following a study’s detailed descriptions, claims 
and conclusions obtained by scientists can be verified by others. This is 
done in the name of transparency, traceability and accountability, which are 
also fundamental aspects of post-authentic work. The difference however 
lies in the purpose of these fundamental notions; whereas in science they are 
primarily aimed at allowing independent confirmation of a study’s results, 
within the post-authentic framework, they are not solely concerned with 
this specific scientific goal and in fact they move beyond it. For example, 
traditionally, a study is believed to be replicable if sufficient transparency 
has been observed on the data, the research purposes, the method, the 
conclusions, etc. and yet some studies can be perfectly transparent and not 
at all replicable (Peels 2019; Viola 2020b). This is, for instance, believed to 
be the case especially in the humanities for which the very nature of some 
studies can make replication impossible, for example, due to a particularly 
interpretative analysis (Peels, 2019). 
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On the opposite end of the scale, empirical works are believed to be—at 
least in theory—fully replicable. Thus, despite the still unresolved debate 
on the ‘R-words’, over the years, protocols and standards for replication in 
science have been perfectioned and systematised. When computers started 
to be used for experiments and data analysis, things turned complicated. 
Plesser (2018), for instance, explains how it became apparent that the 
canonical margins for experimental error did not somehow apply to digital 
research: 


Since digital computers are exact machines, practitioners apparently assumed 
that results obtained by computer could be trusted, provided that the 
principal algorithms and methods employed were suitable to the problem 
at hand. Little attention was paid to the correctness of implementation, 
potential for error, or variation introduced by system soft- and hardware, 
and to how difficult it could be to actually reconstruct after some years—or 
even weeks—how precisely one had performed a computational experiment. 
(Plesser 2018, 1) 


The post-authentic framework is comfortable with the belief that attain- 
ability of complete objectivity (and therefore perfect replicability) is always 
but an illusion. Indeed, the post-authentic relevance of transparency, trace- 
ability and consequently, accountability lies primarily in the acknowledge- 
ment of a collective responsibility, the one that comes with the building of 
a source of knowledge for current and future generations. Thus, within the 
post-authentic framework being transparent about both the ‘raw’ and the 
processed material, about the methodology, the analytical processes and the 
tools assumes a whole new importance: the creation of other digital forms 
which allow to trace technical obsolescence, acknowledge power relations 
and attempt to fluidly incorporate the exchanges that lead to symbiosis, 
not friction, across interactions. As argued by Fiona Cameron with regard 
to digital cultural heritage (2021, 12): 


[digital cultural heritage] encapsulate[s] other registers of significance, tem- 
porality and agency such as planetary technological infrastructures, material 
agency, non-human, elemental, and earthly processes, all of which are 
invisible figures in their constitution. 


The post-authentic framework for digital knowledge creation recognises 
that whatever arises out of the confluence of all these different agencies 
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cannot be fully predicted. The role of documentation by researchers, 
museums, archives, libraries, software developers and so on acts therefore as 
a means to acknowledge that we are writing the future past and that writing 
the past means controlling the future. The post-authentic framework 
provides an architecture to meet the need for accountability to current 
and future generations. 

Finally, the documentation of the interventions has wider resonance 
particularly in relation to increasing awareness towards sustainability in 
digital knowledge creation. In June 2020, the UN published the Roadmap 
for Digital Cooperation report which set a list of key actions to be 
achieved by 2030 in order to advance a more equitable digital world. 
Whilst acknowledging that ‘Meaningful participation in today’s digital 
age requires a high-speed broadband connection to the Internet’ (United 
Nations 2020b, 5), the report also highlights that half of the world’s 
population (3.7 billion people) currently does not have access to the 
Internet. The lack of digital access, also commonly referred to as ‘Digital 
Divide’, affects those mostly located in least developed countries (LDCs), 
landlocked developing countries (LLDCs) and small island developing 
states (SIDS) with an even more acute gap in countries such as sub-Saharan 
Africa, where only 11% have access to household computers and 82% lack 
Internet access altogether. 

The digital inequality worsens the already existing inequalities in society 
as those who are the most vulnerable are disproportionately affected by 
the divide. Based as it is on a universal vision of digital transformation, 
current digital knowledge creation practices face therefore not only the 
danger of being available exclusively to half of humanity but also of 
yet again imposing Western-centred perspectives on how knowledge is 
created and accessed. The future looks ever more digital and digitally 
available repositories will become larger and larger; reconceptualising 
digital objects within the post-authentic framework means also fostering 
their reconceptualisation not just in terms of what we are digitising but 
also how and for whom. In this sense, the creation, curation, analysis and 
visualisation of digital objects should whenever possible prefer methods 
and practices that make curatorial workflows sustainable, interoperable and 
reusable. This should include the storage of the material in an Open Access 
repository, the use of freely available and fully documented software and 
a thorough documentation of the implemented steps and interventions, 
including an explanation of the choices made which will in turn facilitate 
research accessibility, transparency and dissemination. 
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In the next chapter, I will illustrate the third use case of the book, the 
application of the post-authentic framework to digital analysis. Through 
the example of topic modelling, I will show how the post-authentic 
framework can guide a deep understanding of the assemblage of culture 
and technology in software and help us achieve the interpretative potential 
of computation. I will specifically discuss the implications for knowledge 
creation of the transformation of continuous material into discrete form— 
binary sequences of Os and 1s—with particular reference to the notions 
of causality and correlations. Within this broader discussion, I will then 
illustrate the example of topic modelling as a computational technique 
that treats a collection of texts as discrete data, and I will focus on the 
critical aspects of topic modelling that are highly dependent on the sources: 
pre-processing, corpus preparation and deciding on the number of topics. 
The topic modelling example ultimately shows how producing digital 
knowledge requires sustained engagement with software, in the form of 
fluid, symbiotic exchanges between processes and sources. 


NOTES 


1. See, for instance, the Europeana Semantic Enrichment Framework at 
https: //pro.europeana.eu/share- your-data/enrichment. 

2. usa in Italian means ‘he/she/it uses’. 

3. For the documentation of the training of the Italian model used to annotate 
ChroniclItaly 3.0 including information on the output format and F1 score, 
please see Viola et al. (2019); Viola and Fiscarelli (2021b). 

4. https://github.com /lorellav/DeXTER-DeepTextMiner. 

5. This logarithmic function is the inverse function of the exponential func- 
tion. 

6. https://cloud.google.com /natural- language /docs /analyzing- sentiment. 

7. https://console.cloud.google.com/. 

8. https://cloud.google.com/natural-language /docs/basics#interpreting_ 
sentiment_analysis_values. 

9. See note 5. 

10. For an overview of the so-called R debate, please see Rougier et al. (2016). 
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CHAPTER 4 


How Discrete 


If you torture the data long enough, it will confess to anything. (Attributed 
to Ronald H. Coase, 1960) 


4.1 METAPHORS WITH DESTINY 


Metaphors are fascinating and powerful linguistic devices. Over the years, 
numerous scholars have indeed extensively explored their manipulative 
talent for creating realities (see for instance, Lakoff 1992, 2004, 2008; 
Goatly 2007; Mio and Katz 2016). In the context of political discourse 
alone, for example, the study of metaphors’ capacity to hide or popularise 
latent ideologies, justify or blame governments’ decisions, or strategically 
attribute blame goes back decades (e.g., Musolff 2004, 2010, 2014; Goatly 
2007; Ottatti et al. 2014; Viola 2020a). Though extremely powerful— 
‘Metaphors can Kill’ (Lakoff 1992, 1)—metaphors are neither good nor 
bad per se; we simply routinely use them, often rather unreflectively, so 
that abstract and complex ideas can be processed in a cognitively simplified 
way (ibid.). What makes metaphors so effective, particularly conceptual 
metaphors, is their use of conceptual frames such as war, disease, sport, 
family, religion and others which, by evoking mental images that are famil- 
iar to the message receivers, can turn complex concepts into a simple, linear 
logic (Viola 2020a). It is thanks to this ‘framing power’ that metaphors’ 
arguments become plausible and the proposed conclusions are perceived as 
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unproblematic and even ‘self-evident’ (Musolff 2016, 133). Moreover, as 
we mostly use metaphors implicitly, such framing power remains typically 
unnoticed and so do metaphors. So, for example, in the context of the 
COVID-19 pandemic, when commenting on the effectiveness of Italy’s 
decision to institute national lockdown, French Prime Minister at the time 
Edouard Philippe said, ‘To block the country does not allow to contain 
the epidemy’! (Valeurs actuelles 2020). At the time when the comment 
was made, France was adopting much less drastic measures compared to 
Italy; therefore, the differences in the two countries’ crisis management 
approaches needed to be justified, and in order to be accepted by the 
nation, the domestic strategy had to be presented to the public as the 
best possible solution (Viola 2022). In this particular example, the framing 
power is conveyed by the expression to block the country: the metaphorical 
use of the verb to block frames the Italian lockdown measure not only as 
overly aggressive but wrongly targeted: it is the country that is put to a 
halt, not the spread of the virus. 

But metaphors are not typically found just in political discourse; scien- 
tific discourse also regularly exploits the power of metaphors to simplify 
complex concepts. In 2003, Blei et al. published a study which, at the 
moment of writing, counts 36,483 citations (2003). The paper tackled 
the task of modelling a collection of discrete data, for example, a corpus 
of texts, for efficient processing tasks such as classification and content 
summarisation. The authors’ basic idea was to model each item in the 
collection, e.g., each text, according to the Latent Dirichlet Allocation 
(LDA) model, a generative probabilistic model for which documents are 
represented as distributions of sets of words statistically likely to occur 
together. Although the article itself was titled ‘Latent Dirichlet Allocation’, 
the technique described in the article went down in history as topic 
modelling. The reason for that may be found in the fact that the authors 
had decided to name the above-mentioned sets of words as ‘topics’, albeit 
their intention was not to make epistemological claims regarding the latent 
variables but to simply ‘exploit text-oriented intuitions’ (996), that is, to 
take advantage of a familiar image such as that of topics. In other words, 
the term topic was used metaphorically. 

A similar observation about the metaphorical use of everyday notions to 
refer to techniques which are however based on specific, rather different, 
principles may also apply to computational techniques such as ‘sentiment 
analysis’ and ‘machine learning’. The metaphorical use of the terms ‘sen- 
timent’, ‘learning’ and ‘topic’ may be harmless within the fields that have 
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devised such techniques because the principles upon which they are based 
are very clearly defined by their creators and understood in those circles. 
It may on the contrary have huge consequences when these methods are 
passively transferred into other disciplines or practices. In his analysis of 
informational approaches in cancer biology research, Longo (2018), for 
example, critiques the extensive use of computer science terminology such 
as ‘instructions’, ‘to reprogram a deprogrammed DNA’ and in general the 
DNA described as a computer program and genes as information carriers. 
He argues (88): 


The informational approach in biology conflates the concept of program- 
ming on discrete data with the common-sense understanding of ‘informa- 
tion’ and ‘computer program’, which are vaguely familiar to everybody [...] 
In fact, the use of ‘information’ and ‘programming? in biology is not scientific 
because it neither applies the mathematical invariants proper to information 
and programming, nor the theorems proper to the corresponding scientific 
disciplines. Instead, it transfers a vague, everyday notion and refers to ‘weak’ 
meanings. 


Longo argues that the metaphorical use of mathematical and compu- 
tational language has had enormous consequences for molecular biology 
cancer research which essentially studies cancer as the result of DNA de- 
programming, inherited or otherwise caused by a carcinogen that disrupts 
the DNA ‘encoded instructions’ (92). The use of an everyday notion 
such as that of ‘program’, he continues, has also no doubt facilitated 
understanding among funding agencies and the public, perhaps even 
leading to the exclusion of alternative hypotheses. Similarly, one might 
argue that it is the metaphorical use of the word topic that explains why 
topic modelling has become so popular beyond computer science and in 
the humanities in particular: whereas not everyone may be an expert in 
statistical modelling, we are all more or less familiar with a fairly general 
conceptualisation of what a topic is. However, what humanities scholars 
may have not been too familiar with—and to a large extent, still aren’t—is 
the set of assumptions behind a method born in the computer sciences and 
adopted in critical research. 

The popularity of topic modelling beyond computer science (as well 
as SA and ML) is closely related to another phenomenon, well-known in 
linguistics: when a metaphor is adopted by a significant part of the linguistic 
community, language users may no longer be aware of its metaphorical 
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use, the metaphor becomes a common meaning and so it dies (Ricoeur 
2003, 115). The metaphorical use of ‘sentiment’, ‘learning’ and ‘topic’, 
I will argue here, has certainly contributed to make these techniques 
very popular outside of their field of origin. At the same time, however, 
precisely because of this popularity, these meanings have become common 
meanings, i.e., ‘dead metaphors’. This in turn has major consequences: the 
creation of epistemological expectations that these methods will obviously 
disappoint (Puschmann and Powell 2018). For example, as I have discussed 
in Chap.3 in reference to SA, the familiar word ‘sentiment’ creates a 
specific epistemological expectation, that it is somewhat possible to obtain 
a neutral way to assess attitudes and moods in large quantities of material. 
Assessment, however, requires language understanding as a prerequisite 
and when it comes to machines, this is exactly what they are not able to 
do. The post-authentic framework that I advance in this book serves also 
as a reminder that these terms are used as mere metaphors. 

In the next section, I will discuss a more concerning aspect concealed 
by the use of vague, familiar notions such as ‘sentiment’, ‘learning’ and 
‘topic’: the underlying process upon which these techniques are based, i.e., 
the elaboration of continuous information into discrete systems and the 
implications for causality. In discrete systems, causality is hidden because 
information is rendered as exact and separate points, all encoded in one 
dimension and according to precise instructions (Longo 2018). The three- 
dimensional, causal essence of information cannot be accessed by the 
user who, instead, is offered an altered image made up of predictions 
of correlations. The resulting information will still refer to its original 
continuous structure, but computers will only render it as a sequence of Os 
and Is, that is in discrete form, thus hiding relational causality. 

In the case of SA, this distorted image is reflected in the reduction 
of the subjectivity of human emotions to two/three categories, scored 
according to probabilistic calculations; in the case of ML, the holistic, 
human capacity to acquire knowledge and skills through experience, logic 
and contextual factors is reduced to the probabilistic processing of huge, 
yet partial, quantities of discrete data; in the case of topic modelling, the 
text itself disappears and so does its entrenchment in the wider context 
that produced it. In all these cases, the three-dimensional, causal structure 
is no longer accessible nor is its historical and social susceptibility as it 
is all dissembled by the computational, dualistic system of Os and 1s. 
This conflation of discrete data modelling with familiar notions such as 
‘sentiment’, ‘learning’ and ‘topic’ has therefore certainly contributed to 
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make these methods extremely popular outside their fields of origin, but 
at the same time, it has obfuscated the well-defined laws upon which they 
are based. Longo claims: 


This is an amazing technological achievement: by fine engineering, one may 
forget the underlying physical hardware and its continuous flows and just 
consider (and work on) the discrete software processes by writing alpha- 
numeric programs. (Longo 2018, 87) 


In a world where all information is digital, the consequence of this 
amazing technological achievement is that it also presents a distorted image 
of knowledge because, to paraphrase David Tong, the world does not seem 
to be discrete (Tong 2011). 

In this chapter, I first examine the implications of adopting discrete 
methods and technologies not just as quantitative tools in the humanities 
but for knowledge production in general and, more widely, for our 
understanding of society. Specifically, I reflect on the notions of causality 
and correlations in light of the considerations discussed so far about 
the mythicised discourse on data and technology neutrality, the dangers 
of using metaphorical language to refer to digital technologies and the 
consequential urgent need for knowledge reconfiguration inspired by 
symbiosis and mutualism. I then proceed to examine the text mining 
technique of topic modelling and the premises on which it is based with 
a special focus on its use of discrete mathematics to encode information. 
Finally, I illustrate how applying the post-authentic framework to topic 
modelling can facilitate critical engagement with this technique, especially 
in humanities research. 

In my discussion, I argue that such engagement can only happen by 
maintaining a sustained connection with the digital object and I demon- 
strate how the application of key post-authentic concepts and methods 
can be especially effective at three decisive stages in a topic modelling 
workflow: pre-processing, corpus preparation and choosing the number 
of topics. The post-authentic framework, as the analysis will show, may 
be especially effective at prompting the active and reflexive participation 
of the user in the process of knowledge production in the digital. In the 
next section, I start my argument by discussing the implications of the 
‘big data philosophy’, that is, the obsession with patterns and correlations 
as opposed to causation, to explain phenomena; I also examine such 
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implications in relation to topic modelling and its use for knowledge 
creation, in humanistic enquiry and beyond. 


4.2 CAUSALITY, CORRELATIONS, PATTERNS 


Perhaps one of the most significant implications of the ‘Digital Turn’ in the 
humanities, more widely in the natural, computational and social sciences, 
and more widely still in relation to the digitisation of society is contained 
in the notion of discrete vs continuous modelling of information. The 
concepts of discrete and continuous and the tension between the two 
are at the foundation of mathematical thought and of how mathematical 
modelling is used to explain natural phenomena (Fenstad 1985). A way to 
understand the crucial difference between discrete and continuous struc- 
tures is to consider that in a discrete structure, all points are isolated and 
completely disconnected from each other; one can therefore label them and 
count them and their count is exact and absolute. On the contrary, one can 
only access a continuous structure by measuring it and these measurements 
create intervals or fractions of intervals; moreover, in the continuous, a scale 
for the measurement has to be set (Longo 2018, 84). Therefore, in discrete 
systems, there is no room for approximation, no uncertainty, no nuances, as 
something is either one point or another, whereas in the continuous—since 
phenomena can only be accessed by measuring them—the measurements 
are always approximated (Longo 2019, 64-65). 

Even without going too deep into the full mathematical (and physical!) 
ramifications of these two notions, one can intuitively understand that 
they refer to very different ways of mathematical thinking. A fundamental 
difference particularly relevant to the arguments advanced in this book 
is concerned with the understanding of causality, a notion whose theo- 
retical conceptualisation from philosophy to physics can be traced back 
to antiquity. For the sake of the argument advanced in this chapter, I 
will summarise the discussion by saying that in the classical worldview 
which prevailed until the twentieth century, a mechanistic notion strongly 
identified causation with determinism. Determinism can be understood 
as the ability to determine the future state of a physical system from its 
present state (Weinert 2005, 196). According to this view, also known 
as functional view of causation, every event has a unique cause that 
precedes it (de Laplace 1820; Stigler 1986; Cpek and Capek 1961), 
and therefore the world is seen as an ‘uninterrupted chain of causes and 
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effects’ (Holbach 1770). This view has been criticised over the course 
of the twentieth century for several shortcomings such as the proximity 
of elements in determining cause-effect relationships, predictability as the 
main criterion for establishing causation and the reduction of causality 
essentially to a mere temporal relationship. Discoveries of and advances in 
differential equations, atomic physics and quantum mechanics have further 
consolidated such criticisms eventually leading to the current separation 
of causality from determinism. Particularly in quantum mechanics, recent 
experiments have provided strong evidence for the validity of this notion 
of causality without determinism. In this view, consequent states of a 
quantum system are related to its antecedent states by a form of conditional 
dependency (Weinert 2005, 241) as opposed to every event having a 
unique cause that precedes it. 

Coming back to the distinction between discrete and continuous struc- 
tures, this means that in discrete systems, there is no deterministic cause- 
effect relationship, because points are totally separated from each other, 
whereas in continuous systems, causal relations can be observed and 
measured, but not predicted? (Longo 2018, 86). Though it may appear 
inconsequential at first, this observation about causality has specific and 
profound implications that stretch well beyond mathematical and phys- 
ical reasoning. Stating that in discrete structures such as say a database 
where something belongs to either one category or another, no cause- 
effect relationship of observed phenomena can be established but only a 
probabilistic one essentially means that explanations for such phenomena 
cannot be found, only correlations. If two random variables are correlated, 
or as noted by Calude and Longo (2017), co-related, it means that they 
are associated according to a statistical measure, that they co-occur. This 
statistical measure is rendered by a correlation coefficient, a number 
between —1 and | that expresses the strength of the linear relationship 
between two numeric variables. If two variables are positively correlated 
(e.g., they both increase), then the correlation coefficient will be closer to 
1, if there is a negative correlation (1.e., they are inversely correlated), it will 
be closer to —1, and closer to 0 if there is no correlation at all. It is a well- 
established fact in statistics and beyond that a correlation coefficient per se 
is not enough to explain the cause for the patterns that are captured.* 

The identification of statistical correlations is nevertheless an important 
factor in understanding the relationship between two quantitative variables 
and it remains an insightful method that can potentially lead to significant 
discoveries. Indeed, the observation of correlations is at the foundation 
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of the classic scientific method in the sense that starting from the mea- 
surement of correlated phenomena, scientists have been able to formulate 
theories that could be tested and later confirmed or disproved. The history 
of science is full of extraordinary achievements which originated from 
mere observations of not-so-obviously correlated phenomena, for example, 
distributional semantics theory, a famous linguistic theory that stemmed 
from the intuition of Zellig S. Harris and John R. Firth, two semanticists 
(though Harris was also a statistical mathematician). This intuition— 
famously captured by Firth’s quote ‘You shall know a word by the company 
it keeps’ (1957, 11)—acknowledges the relevance of words’ collocation 
(i.e., the place of occurrence of words) in determining their meaning. The 
core idea behind Harris and Firth’s work on collocational meaning and 
distributional semantics is that meanings do not exist in isolation; rather, 
words that are used and occur in the same contexts tend to purport similar 
meanings (Harris 1954, p. 156). 

In those days, gaining access to real language data was costly and 
very time-consuming and for a long time, it was not possible to test 
this theory. But more recently, new advances in computer science merged 
with huge quantities of naturally occurring language material, including 
digitised historical data-sets, have indeed proven that languages are not 
deterministic systems—as previously believed—but that they should be 
thought to be ‘probabilistic, analogical, preferential systems’ (Hanks 2013, 
310). As intuitively theorised in distributional semantics, words do not 
have a one-to-one relationship with meaning because meanings are not 
precise, exact or stable. To the contrary, words in isolation do not possess 
any meaning and meanings can only be entailed from words’ context. As 
argued by Harris, ‘We cannot say that each morpheme or word has a single 
or central meaning, or even that it has a continuous or coherent range 
of meanings’ (Harris 1954, 151). Sixty years after its initial formulation, 
distributional semantics theory laid the basis for Google’s renowned 
word2vec algorithm, and today, it constitutes the theoretical background 
of NLP studies concerned with language and meaning, including the very 
topic modelling (cfr. Sect. 4.4). 

Coming back full circle to causality, correlations and patterns, a correla- 
tion measure only informs us of the strength of a relationship between two 
variables, whereas the patterns tell us that certain regularities can be found 
in how the observed variables are distributed. Hence, for they highlight 
trends in the data, correlations and patterns may potentially have predictive 
power, but neither of them provides causal explanations for the analysed 
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phenomena nor they intrinsically carry significance. In the next section, 
I will elaborate on these reflections to discuss the important implications 
for society of operating predominantly within the discrete system of the 
contemporary encoding of all digital information, binary sequences of Os 
and 1s. Taking the example of analysis of material that had originally been 
conceived of as a coherent entity, i.e., continuous (e.g., a book, a collection 
of essays on the same topic, all the issues of a newspaper), I explore the 
implications of its digital encoding into discrete form through digitisation 
and subsequent digital analysis. One critical implication, I argue, is that 
the adoption of an indiscriminate, data-driven approach to analysis risks to 
completely disregard context and to attribute meaning to correlations and 
patterns per se. Through the example of topic modelling and its application 
to the analysis of ChroniclItaly 3.0, further in the chapter, I show how the 
application of concepts and methods of the post-authentic framework to 
digital knowledge creation can be useful to prompt a critical stance towards 
computational methods and tools which I argue is urgently needed for the 
configuration of a model for knowledge production in the digital. 


4.3 MANY PATTERNS, FEW MEANINGS 


Big data analytics (cfr. Sect. 1.2) is supported by the idea that correlations 
are expected to be recurrent, i.e., they will iterate similarly along the chosen 
parameter, for example, time (Calude and Longo 2017, 602). Recurrent 
correlations are an established scientific principle and they can be observed 
in natural cycles such as the water cycle and the alternation of seasons. 
The recurrence of correlations suits well deterministic systems in which 
it is believed that one can determine the future state of a physical system 
from its present state (cfr. Sect.4.2). This is precisely what the ‘big data 
philosophy’ states: because patterns are expected to be recurrent, the future 
can be predicted by statistical algorithms based on the patterns found in 
past data, without the need for causal explanation. Naturally, the larger the 
data-set, the more accurate the prediction. 

This idea that all that counts are the patterns is not in fact new and it can 
be traced back to the 1990s and to Complexity Theory (Waldrop 1992). 
Complexity Theory argues that there is a hidden order to the behaviour 
and evolution of complex systems and chaos can be made manageable 
by looking at its underlying, ubiquitous patterns. What these patterns 
show is how complex systems work, more specifically how organisations 
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cope with uncertainty and nonlinearity and manage to remain stable. 
The idea behind Complexity Theory is that complex systems are the 
assemblage of extremely convoluted factors which make them funda- 
mentally unpredictable. Yet, at the same time, complex systems exhibit 
order rules according to which independent actors, i.e., discrete elements, 
spontaneously self-organise. This contradictory property makes it possible 
for patterned behaviour and properties to be observed. It also means, 
however, that the meaning of any system is irrelevant as the focus is and 
remains on the observed behavioural patterns. 

One does not have to dig too deep to see how computer science has 
strongly supported Complexity Theory. Indeed, Complexity Theory fits 
perfectly with what machines excel at: finding patterns in the data (Turkle 
2014). Ever powerful computers can be given enormous quantities of 
data and instructed to find the patterns that human beings will never be 
able to find. And it works. Patterns are always found. However, despite 
appearing (at first, at least) logically sound and despite being validated by 
the cycles present in nature, the discourse surrounding big data analytics 
obscures at least four fundamental truths. Firstly, as said earlier in the 
chapter, in discrete systems such as a database, no cause-effect relationship 
of observed phenomena can be established but only correlations and 
patterns. Computers are not programmed to find meanings, only the 
patterns; as correlations and patterns do not intrinsically carry significance, 
this essentially means that databases provide an a-causal image of the 
world (Longo 2018, 86). Thus, what the big data hype obscures is that 
today’s computer-dominated world offers us countless patterns but no 
explanations for them, and so we are left to deal with a patterned, yet a- 
causal, way of making sense of reality. 

Secondly, the idea that information is uniquely absorbed from data 
is also closely related to Complexity Theory. The theory argues that 
complex systems are constantly altered by agents’ interactions through a 
process of feedback loops; thanks to their intrinsic capacity to learn from 
experience, complex adaptive systems are organic and better evolving. 
The big data approach has essentially adopted this theory in toto, but it 
seems to have failed to recognise that machines are in fact incapable to 
learn. Indeed, the deterministic belief that the future state of a physical 
system can be predicted from the observation of its past state, which in 
any case has been criticised over the course of the twentieth century and 
mostly disproved as discussed in Sect. 4.2, has become conflated into the 
metaphorical use of the word ‘learning’ in ML. The familiar notion of 
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‘learning’ confounds what learning actually means for a machine—finding 
correlations and patterns but no causal explanations—with the human 
capacity to understand and make sense of the world, i.e., attempting to 
find causality. 

Thirdly, the big data analytics’ deterministic claim that based on avail- 
able data, one can provide accurate predictions of the future without 
the need for causal explanation is provably wrong. Calude and Longo 
(2017) demonstrated that in a large enough data-set, there will always 
be correlations but most of them will be random, i.e., meaningless. This 
means that the probability that a series of correlations will be recurrent 
as in the natural cycles is extremely low; the authors explain: ‘recurrence 
may occur, but only for immense values of the intended parameters and, 
thus, an immense database’ (ibid., 609). In other words, the patterns 
found in databases do not per se constitute sufficient proof to offer reliable 
predictions of the future because most of these patterns will actually 
be false positives. In techniques such as topic modelling, an element of 
randomness is in fact built into the algorithm itself as initially, documents 
are assigned to topics through random probability. Although it is true 
that the calculations become increasingly accurate as the algorithm iterates 
through more documents, the risk once again is to see meaning where 
there is none. 

Fourthly, the fact that databases are exact, i.e., discrete, perpetuates 
the false belief that data is also exact, neutral and objective. It is always 
emphasised by the ‘big data philosophy’ that statistical algorithms will find 
patterns where nobody else can, and because databases are exact, this is 
enough. What is on the contrary not at all emphasised is the subjective and 
interpretative dimension of collecting, selecting, categorising, aggregating, 
in other words of making data. Recognising that data is created makes 
the claims of absolute impartiality, exactness and reliability shaky at best 
and ethically concerning at worst, particularly when necessarily incomplete, 
biased and opaquely collected data is used to make predictions that 
influence decision-making processes or produce research findings. 

Reassuringly, these limitations have recently started to be at the centre 
of the academic debate and have originated the so-called causal inference 
challenge. In their work The Book of Why (2018), computer scientist Judea 
Pearl and mathematician Dana Mackenzie argue that these limitations 
make the big data philosophy inadequate to solve our world’s challenges. 
They note that as current ML solutions cannot find the causality relations 
between patterns, they inevitably fail to generalise beyond the domain 
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of examples present in a given data-set, which most of the time will 
include synthetic data (as opposed to real-world generated data). In other 
words, most current ML methods tend to ‘overfit the data’, meaning that 
‘they try to learn the past perfectly, instead of uncovering the real/causal 
relationships that will continue to hold over time’ (Gonfalonieri 2020). 
New avenues in this direction are increasingly being explored and have 
resulted in new emerging fields such as causal machine learning (see for 
instance, Pearl et al. 2016; Shanmugam 2018; Hernan and Robins 2021). 
However, although the interest in this topic has grown exponentially in 
the span of only a few years, methods and applications are still at an 
experimental stage and, to my knowledge, primarily limited to academic 
research. 


4.4 THE PROBLEM WITH TOPIC MODELLING 


The topic modelling algorithm essentially formalises distributional seman- 
tics theory (cfr. Sect.4.2). However, whereas the focus of distributional 
semantics theory is on the meaning of a single word, topic modelling 
tries to capture the overall meaning of clusters of words that appear 
together (1.e., that are correlated) in a document. Put it differently, as single 
words do not possess any meaning but meanings can only be entailed by 
their context, topic modelling assumes that groups of words also purport 
collective meanings, i.e., topics. This all sounds very logical but there is a 
caveat. Similar to quantum, computational and genetic systems, languages 
are discrete representations (i.e., outputs) of fundamentally continuous 
structures (i.e., inputs). This property—called the discrete infinity of 
language—essentially means unlimited productivity from limited means 
(Chomsky and Smith 2000). It describes the ability of languages to create 
an infinite variety of expressions of thought from a limited set of discrete 
elements (Studdert-Kennedy and Goldstein 2003). The discrete infinity 
of language necessarily entails that languages are intrinsically ambiguous 
because meaning is context-bound, but significantly, it indicates that dif- 
ferent contexts shape the creation of infinite meanings. The problem with 
topic modelling is that it provides a probabilistic representation of words’ 
distributions in the ingested documents, but it is completely agnostic of the 
underlying continuous structure of such documents, such as the ambiguity 
of words’ use in each document and across texts as well as the documents’ 


4 HOWDISCRETE 93 


coherent substructure, let alone their wider historical, social and cultural 
entrenchment. 

As said earlier in the chapter, topic modelling provides a probabilistic 
representation of how words are distributed in documents according to 
statistical calculations, that is, correlations. This means that words are 
considered to be discrete elements; for example, in the corpus preparation 
stage (cfr. Sect.4.5.2), words are transformed into numeric variables and 
their distribution across documents is represented as a distribution matrix. 
What topic modelling then does is measuring the strength of the linear 
relationship between these numeric variables. But topic modelling also 
treats the corpus itself as a collection of discrete data, which means that 
each text is also processed as a separate entity totally disconnected from all 
the other texts in the batch. This is true regardless of whether the input 
is all the chapters from the same book, all the issues of a newspaper or 
all the abstracts ever submitted to an academic journal under the keyword 
tag ‘topic modelling’. In other words, it is a computational technique that 
efficiently identifies patterns of words’ distribution, but because it lacks 
the words’ underlying continuous structure—the infinity of language—no 
cause-effect relationship of the correlated phenomena can be established, 
i.e., the meaning of such patterns. 

Another issue with topic modelling is that it assumes that an a priori fixed 
number of topics—which in any case is decided more or less arbitrarily— 
is represented in different proportions in a// the documents. Hence, if the 
algorithm is instructed to find X number of topics, it will build a model that 
fits that number. This assumption behind the technique cannot but paint 
a rather artificial and non-exhaustive picture of the documents’ content 
as it is hard to imagine how in reality, a fixed number of topics could 
adequately represent the actual content of all the analysed documents. 
Thus, correlations will surely be identified but not all these correlations 
will necessarily carry significance, that is, meaning. Moreover, as countless 
parameters can be tweaked, the smallest change will output a different 
model, in which different correlations will be found and many others will 
be missing. Conversely, even when the same parameters from the same 
software are used on the same data-set, the algorithm will output a slightly 
different model, which indeed proves once again that patterns will always 
be identified, regardless of their significance. I will return to this point in 
Sect. 4.5.3. 
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4.5 ANALYSIS OF DIGITAL OBJECTS: A 
POST-AUTHENTIC APPROACH TO TOPIC 
MODELLING 


The post-authentic framework to digital knowledge creation contributes 
to the urgent need for the establishment of critical data and visualisation 
literacy in the current landscape—both public and academic—in which 
computational techniques and outputs are predominantly framed as and 
often believed to be exact, final, objective and true. Whilst exploiting 
the new opportunities offered by computational technologies, the post- 
authentic framework rejects an uncritical adoption of digital methods, and 
it promotes a model not simplistically oriented towards problem-solving, 
solution automation and sleek interface designs but towards encourag- 
ing critical engagement and active participation. This ultimately means 
recognising that knowledge is fluid and that the complex challenges we 
face today therefore require a model of knowledge production that fosters 
symbiotic collaborations, fluid exchanges and mutualistic contributions, as 
opposed to hierarchical separation and competition. 

As an example of how the application of the post-authentic framework 
can contribute towards fluid processes of knowledge creation in the digital, 
including the need for a less naïve conceptualisation of computational 
techniques, digital objects and methods, I discuss here the third use case 
of the book: analysis of digital objects. The example of topic modelling 
demonstrates how critical engagement with computational techniques is 
urgently required to meet the uncertain and problematic aspects of digital 
research. For example, in fields such as DH in which this technique is used 
extensively, a recent survey on LDA topic modelling (Du 2019) found 
out that 74% of the surveyed studies didn’t report how their corpora were 
prepared, more than 70% didn’t report which tool was used to train their 
topic models, almost 57% didn’t report how many topics were trained, and 
about 90.5% didn’t report how their topic models were evaluated. 

DH is not at all an isolated case, however. Though with some dif- 
ferences, a similar trend has also been found in software engineering 
research (Silva et al. 2021) where topic modelling is widely used to 
analyse online conversations among developers or to improve software 
engineering tasks such as source code comprehension. From the analysis of 
111 relevant papers, Silva et al. (2021) found both general inconsistency 
and the adoption of opaque methods in topic modelling practices on 
the whole pointing to a degree of uncertainty on the specificity of the 
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technique itself. The highest inconsistency was found with reference to 
tasks such as choosing the number of topics, naming the topics and 
evaluating the topics’ semantic interpretability. The authors attributed the 
lack of specificity of the technique to the fact that the majority of the 
surveyed papers had employed LDA ‘as is’, that is, they had adopted the 
default parameters as an off-the-shelf software. This approach, however, 
is generally not encouraged; computer scientists openly acknowledge that 
finding the meaning behind the identified patterns is highly dependent on 
the specifics of the sources because, as argued by Hindle et al. (2015, 510), 
‘LDA does not look for the same patterns that people do’. 

In this part of the chapter, I illustrate how the post-authentic framework 
can be applied to topic modelling to guide a more mindful understanding 
of the materiality of the sources. To this end, I deliberately choose cultural 
heritage material, sources that are inevitably problematic from a computa- 
tional point of view. I then focus on the key aspects of topic modelling that 
are highly dependent on the sources and which in my experience have the 
most significant impact on the results: pre-processing, corpus preparation 
and deciding the number of topics. As a case example, I use the already 
discussed Italian American newspapers as collected in ChroniclItaly 3.0 
(cfr. Chaps. 2 and 3); my aim is to emphasise how preparing the material 
for the analysis is part of the analysis itself. My discussion demonstrates 
how, far from being fully automated, neutral and objective, the analysis 
of a digital object requires the analyst to make countless decisions which 
are yet different from the ones required when preparing the material for 
enrichment, even when the same sources are used. Indeed, engagement 
with the technique starts much earlier than the algorithm’s implementation 
stage, which in any case should also not be performed as a fully automatic 
operation. The application of the post-authentic framework allows me to 
evidence how LDA may well be an unsupervised technique, but this simply 
means that it works with unstructured data,’ and not at all that despite what 
may be generally believed it does not require human intervention. 


4.5.1  Pre-processing 


In Chap. 3, I illustrated how pre-processing operations are far from being 
standard and how it is in fact required that each intervention is carefully 
assessed by scholars and practitioners and evaluated on a case-by-case basis. 
In my discussion, I considered the many influential factors at play (e.g., the 
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materiality of the source, the specific task to be performed, the available 
resources, both economic and technical) and illustrated how they in turn 
are embedded in a complex, wide net of co-dependent actors, elements and 
circumstances which have influenced each other and will in turn influence 
current and future interventions. The same considerations apply to the 
analysis of a digital object; this, I maintain, requires a high level of critical 
engagement with the chosen method well before than the algorithm’s 
implementation stage. In the case of topic modelling, for example, which 
takes as its input unstructured data, e.g., plain text, the first thing one needs 
to decide is the scope (cfr. Sect. 3.4), that is, what to consider as documents 
(i.e., the input) (see for instance, Miner 2012). Topic modelling aims to 
represent documents as probabilistic distributions of words; hence, in a 
book, the documents could be the book’s pages else on a newspaper’s 
page, they could be individual articles and so on. Conceptually, it of course 
intuitively makes a difference to search for the topics in a chapter vs the 
topics in each page of that chapter. But this is an important decision to 
make also from a pragmatic point of view: as topic modelling is essentially 
a statistical method, the length of each modelled item, i.e., the document, 
does matter. And yet, although this is a rather determining factor, studies 
using this method rarely specify how the criteria to decide the scope of the 
documents are assessed and, even when mentioned, they are referred to 
vaguely. In Silva et al.’s survey of topic modelling in software engineering 
research (2021), for example, the authors found that 86% did not mention 
such criteria at all nor did they acknowledge documents’ length as being 
an important factor; they also found that even when the relevance of the 
vocabulary size was acknowledged (14%), about a half (7.4%) did not 
specify the selection criteria or the document’s length. 

In the case of CroniclItaly 3.0, I considered that each file in the 
collection corresponds to the first page of each issue published by the 
newspapers on a certain date. This structure mirrors the way the collection 
was digitised by the Library of Congress, evidencing once more the 
inseparable complexity of relations between digital material and its wider 
entrenchment in the surrounding digital infrastructure that created it 
and/or provides it. Therefore, I defined as documents each file/issue as 
it was in the collection; the decision had the dual advantage of modelling 
the documents according to the events narrated on a day/issue basis while 
following the Library of Congress metadata schema. 

In terms of preparatory operations such as removing stopwords, lower- 
casing, removing punctuation, numbers, special characters (cfr. Chap. 3), 
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for the specific task of topic modelling, additional specific linguistic deci- 
sions must also be evaluated, here I discuss stemming and lemmatisation. 
Although both aim to obtain a word root by reducing the inflection in 
words, these operations are built on very different assumptions. Stemming 
deletes the initial or final characters in a token based on a list of common 
prefixes and suffixes that may typically occur in the inflected words of a lan- 
guage (e.g., states > state). It is therefore language-dependent as it relies 
on limited cases which would apply exclusively to certain languages that 
follow specific inflection rules. Therefore for languages that follow fairly 
regular inflection rules such as English, stemming may work reasonably 
well, but applied to highly inflectional languages such as Italian, due to its 
many exceptions and irregularities, the algorithm would almost certainly 
perform poorly. Another strong limitation of stemming is that in many 
cases—including low-inflectional languages—the output would not be an 
actual word, meaning that the operation is likely to introduce new errors. 
On the other hand, as it is not a particularly advanced technique, stemming 
does not require a long processing time or processing power, and therefore 
this solution may be implemented when working with particularly large 
corpora or when constrained by time limitations. 

Lemmatising is on the contrary a much more sophisticated technique as 
it is based on more solid linguistic principles than stemming. By means of 
detailed dictionaries that contain lemmas and by examining words’ context, 
a lemmatising algorithm analyses the morphology of each word and it then 
transforms it into its grammatical root (e.g., better > good). Especially 
in the case of topic modelling in which the output is essentially a list of 
words without any context, lemmatising can be very helpful to distinguish 
between homonyms, words that have the same spelling, sometimes the 
same pronunciation too but which in fact possess different meanings. For 
example, the word mento in Italian can mean either ‘chin’ or ‘I lie’. A 
lemmatising algorithm would theoretically be able to entail the use of 
mento from its context and distinguish it from its homonym; in this case, 
the different outputs would be mento (i.e., chin) for the former and mentire 
(i.e., to lie) for the latter. Because of its complexity, however, lemmatising 
may require a long time and very high processing power to perform, and so 
in the case of large size collections or depending on the available means and 
resources, it may not be ideal. Additionally, if on the one side lemmatising is 
effective at differentiating between homonyms, on the other the reduction 
of all inflected words to their lemma may cause information loss. For 
instance, it would no longer be possible to recognise the tense (present, 
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past, future) or the grammatical person (I, they, you, etc.) of the verbs, 
the gender or number of the nouns, the degree of the adjectives (e.g., 
superlative, comparative), etc. 

To assess whether this type of information is relevant or not depends 
once again on several factors such as the type of data-set (e.g., size, 
content), the context of the digital analysis, the language of the data-set 
and the specific research question(s); researchers should therefore carefully 
evaluate pros and cons of implementing this operation. For example, in 
researching narratives of migration as they were told by Italian American 
migrants, the cons of implementing either stemming or lemmatising would 
in my opinion exceed the pros. Italian is a highly inflectional language 
and a great deal of linguistic information is encoded in suffixes and 
prefixes; stemming therefore ill suits it. Similarly, lemmatising the corpus 
would also cause the loss of information encoded in inflected words (e.g., 
verbs expressed in the first person, collective concepts expressed by plural 
nouns) which could bring valuable insights into the cognitive, subjective 
dimension of the stories told by the migrants. 

Finally, whether to perform or not either of these operations is very 
much dependent on the language of the data-set, not just because dif- 
ferent languages have different inflection rules, but crucially also because 
not all languages are equally resourced digitally. Indeed, as discussed in 
Sect.2.2, the digital consequence of the fact that most mass digitisation 
projects have been carried out in the United States and later in Europe 
is that computational resources available for languages other than English 
continue to remain on the whole scarce. Such Anglophone-centricity is 
often still a barrier to researchers, teachers and curators whose sources 
are in languages other than English. Indeed, the comparative lack of 
computational resources in other languages often dictates which tasks can 
be performed, with which tools and through which platforms (Viola and 
Fiscarelli 2021b). Moreover, even when adaptations for other languages 
may be possible, identifying which changes should be implemented, and 
perhaps more importantly, understanding the impacts these may have, is 
often unclear (Mahony 2018). This includes lemmatising algorithms and 
dictionaries which do not yet exist for all idioms; therefore, for particularly 
under-resourced languages, stemming may be the only, far from ideal, 
option. 
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4.5.2 Corpus Preparation 


There are several libraries, for example, in Python or R, as well as off- 
the-shelf tools (e.g., MALLET) that implement LDA for topic modelling. 
Some allow for more sophisticated parameters than others, but generally 
speaking, they all follow the same principles that I have already discussed: 
a topic modelling algorithm models a number of documents to find 
correlations essentially combining term frequency and word collocation 
operations. In order to model topics from unstructured text, the material 
first needs to be converted into a structured model that allows the 
algorithm to perform such calculations, for example, through a method 
called bag of words (BoW). What BoW does is to first transform the words 
in the documents into numbers, i.e., into ids; this operation is typically 
called ‘dictionary’. It then builds a matrix based on the frequency of the 
words in the documents. 

The generation of a BoW provides a notable example of the decisive 
influence of the analyst on algorithmic processes and therefore ultimately, 
on the output. Specifically, in order to prepare the dictionary, i.e., the 
unique id assignment, the analyst has several so-called optimising oper- 
ations at their disposal. For example, one might decide to filter out 
‘extremes’, terms in the collection that are particularly frequent or infre- 
quent; this operation may be performed in order to obtain what is believed 
to be a more representative core vocabulary. There are several ways to 
perform this task; for instance, the Python library Gensim (Řeh'uřek and 
Sojka 2010) has a built-in function called filter extremes which 
filters out tokens in the dictionary based on their frequency of occurrence. 
The parameters are defined by the user who can decide—though one 
might argue somewhat arbitrarily—to keep tokens which are contained 
in a defined number of documents (i.e., no more than in X number of 
documents and no less than in X number of documents) or to keep only 
the first X number of most frequent tokens. 

Another very common technique originated in the field of IR and 
believed to contribute towards obtaining better topic modelling results 
is the term frequency—inverse document frequency method (TF-IDF). 
The method also scores the ‘importance’ of a word, also known as weight, 
according to its relative frequency, i.e., the frequency of occurrence of 
that word with respect to the number of documents in the collection in 
which it appears. In this way, the weight of words that are ‘expected’ 
to appear more frequently—generally speaking non-salient words such as 
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prepositions, articles and so on but this is also specific to the material— 
is resized accordingly. These preparatory operations are believed to help 
optimise a corpus for IR tasks (not just topic modelling) and in most 
cases, they may succeed. The assumption is, however, that a word is as 
important as its relative frequency, which may be true most times, but 
not always. Indeed, the possibility to capture words that are very rare 
or that appear in very few documents may be as valuable in that they 
may indicate a sudden shift in the used vocabulary, which may in turn 
signal a linguistic change or perhaps even a conceptual one. Furthermore, 
and perhaps even more significantly, these techniques only consider the 
formal frequency of a word, meaning that they do not cater for how that 
word is used. In the words of David Blei (Blei 2012, 82)—one of the 
creators of topic modelling: ‘One assumption that LDA makes is the “bag 
of words” assumption, that the order of the words in the document does 
not matter’. This approach, defined as ‘unrealistic’ by Blei himself, may 
work well for grammatical articles, prepositions or particularly recurrent 
OCR errors, but as no semantic detection is formally conducted, the 
frequency of a word, misleadingly referred to as the weight, becomes the 
unique, determining factor in assessing whether a word is worth keeping 
or not. What is important to remember is that what is worth keeping 
for an algorithm may not reflect at all the writer’s original intention. 
Languages may be probabilistic systems, but since words do not have a 
one-to-one relationship with meaning, they are fundamentally ambiguous, 
preferential systems. For this reason, researchers and practitioners should 
assess carefully whether using relative frequency methods is the best option 
when preparing the corpus to train the topic models. For example, research 
has shown that statistically more accurate models do not necessarily lead 
to a higher interpretability of the results (Jacobi et al. 2015). 

As an attempt to retain the meaning of words, a method that aims to 
compensate for this shortcoming is preparing the corpus as a dictionary of 
n-grams, typically bi-grams or tri-grams. These are pairs or triples of words 
that are statistically more likely to occur together than if they were found 
independently from each other. Several studies (see for instance, Wallach 
2006; Wang et al. 2007; Kherwa and Bansal 2020) have indeed reported 
that using bi-grams to prepare the corpus may increase topics’ interpretabil- 
ity as well as the efficiency of statistical methods such as perplexity and 
coherence (cfr. Sect. 4.5.3), developed to help researchers and practitioners 
optimise topic modelling results. Unfortunately, preparing the corpus as a 
dictionary of n-grams is a lengthy and intense process which may indeed be 
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costly and time-consuming, especially in the case of very large repositories. 
Furthermore, researchers working on historical material which typically 
contains a high number of OCR errors should consider the actual added 
value of using this technique. Studies on topic modelling which suggest 
novel IR techniques or improved corpus preparation methods such as 
those discussed here and which report an increase in the models’ quality 
typically make use of digitally born data such as online film reviews, blogs, 
news websites’ headlines or contemporary conference proceedings. Being 
digitally born, these data-sets are of very high quality, especially compared 
to digitised historical material. Indeed, the amount of OCR errors in 
historical collections inevitably skims the output as each word containing 
an error will be interpreted by the algorithm as a new word, even if only 
by one character. Although pre-processing steps are taken to improve the 
quality of the collection, many errors may remain. In most cases, these 
errors would not prevent a human from reading and understanding, but 
they will interfere with how a machine processes the text. As LDA is a 
probabilistic method, regardless of the specific variations in the chosen pre- 
processing and corpus preparation techniques, the results will be heavily 
reliant on the data quality. 

Finally, it is worth reminding that, due to the intrinsic unstable and non- 
deterministic nature of topic modelling, assessing how and to what extent 
any of these corpus preparation techniques actually improves the quality of 
the models remains difficult. Users should indeed be aware that findings 
obtained with topic modelling can never be fully replicated or generalised 
even if the same data-sets are used, the same steps are implemented and 
the same LDA settings are chosen from the same library/tool (Silva et al. 
2021, 120). The post-authentic framework acknowledges such limitations 
and it is mindful of drawing conclusions which are based solely on topic 
modelling findings. 


4.5.3 Number of Topics 


The weaknesses and limitations as well as the dangers of overly trusting the 
capacity of topic modelling to find meaningful patterns have been openly 
acknowledged by several authors, including its very creators. Already in 
2009, Chang et al. (2009), for example, compared the task of interpreting 
the topics, i.e., finding the semantic meaning of the discovered patterns, to 
the Chinese ritual of reading tea leaves. The authors wanted to warn users 
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of the high risk of attributing meaning to patterns and trends that in reality 
may be ‘spurious’ in the mathematical sense, i.e., meaningless (Calude and 
Longo 2017) (cfr. Sect. 4.3). Naturally, the risk is even higher when the 
technique is adopted uncritically, especially in fields outside of computer 
science. The authors clarified that although typically it is implicitly assumed 
that the identified latent spaces will be semantically salient, in reality, this is 
not at all what the promise of topic modelling is about. Since then, others 
(see for instance Bail 2018) have also openly acknowledged the limitations 
of the technique and repeatedly attempted to reframe topic modelling as 
‘a tool for reading’ rather than a tool for meaning, that is, an exploratory 
tool which in order to obtain more nuanced and reliable findings, should 
be integrated with other methods. In this respect, for instance, sociologist 
Chris Bail (ibid.) notes: 


Despite this rather humble assessment of the promise of topic models, many 
people continue to employ them as if they do in fact reveal the true meaning 
of texts, which I fear may create a surge in “false positive” findings in studies 
that employ topic models. 


The application of the post-authentic framework to topic modelling 
helps reframe the technique as a statistical tool and resizes the user’s 
expectations accordingly. Topic modelling posits a set of multinomial 
distributions over words—misleadingly called topics—as being present in 
each document in various proportions; it provides fairly accurate models of 
documents based on their words’ distribution as grouped into clusters. This 
is valuable for obtaining a corpus representation through its words’ distri- 
bution and/or for predicting a model of unseen text but the commonly 
shared belief that these identified word clusters will also be semantically 
meaningful, i.e., that they will be topics in the human sense, remains only 
anecdotal (Chang et al. 2009). 

The high risk of finding patterns that are in reality meaningless can be 
exemplified by the challenge of finding the so-called ‘optimal’ number of 
topics. This task requires user’s input to instruct the algorithm about how 
many words’ distributions it has to search for in the corpus, which of course 
cannot be known in advance. Depending on individual cases, sometimes 
researchers and practitioners may know the collection extensively enough 
to feel confident about what this number might be; others prefer building 
multiple models with different numbers of topics to subsequently compare 
the various compositions of the topics (Viola and Verheul 2019b). If on 
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the one hand this approach allows the researcher to closely examine the 
varied topics’ structures before deciding on the most coherent model, 
on the other it has the limitation to potentially lead analysts to prefer a 
model that seems to confirm their a priori ideas, thus resulting in biased 
interpretations. This approach may work fairly well in those cases when 
the analyst has extensive knowledge of the material, the field and the 
period of reference of the collection among others, but it is generally 
not recommended in statistics; in the words of statistician Stephen M. 
Stigler: ‘Beware of the problem of testing too many hypotheses; the more 
you torture the data, the more likely they are to confess, but confessions 
obtained under duress may not be admissible in the court of scientific 
opinion’ (Stigler 1987). 

More often, however, very little is known about the actual content of 
the documents as true content is exactly what the technique is wrongly 
believed to be able to find, which provides the original justifying argument 
for using the method. It goes like this: due to the increasingly large size of 
available digital material, it is not possible for researchers and practitioners 
to explore the documents through traditional close reading methods; not 
only would this be too time-consuming but also somewhat less efficient as a 
machine will always outperform humans in identifying patterns. Although 
this is in principle true as clarified earlier, the assumption that all the found 
patterns are intrinsically meaningful is not. To meet this challenge, research 
has been conducted towards implementing statistical methods that could 
help researchers and practitioners find the craved ‘optimal number of 
topics’. Two of the most common methods are model perplexity and 
topic coherence, measures that score the statistical quality of different 
topic models based on the topics’ compositions in several models. Though 
not unanimously, the believed assumption behind these techniques is 
that a higher statistical quality yields more interpretable topics. Model 
perplexity (also known as predictive likelihood) predicts the likelihood 
of new (i.e., unseen) text to appear based on a pre-trained model. The 
lower the perplexity value, the better the model predicts the distribution 
of the words that appear in each topic. However, studies have shown 
that optimising a topic model for perplexity does not necessarily increase 
topics’ interpretability, as perplexity and human judgement are often not 
correlated, and sometimes even slightly anti-correlated (Jacobi et al. 2015, 
7). 

Topic coherence was developed to compensate for this shortcoming 
and it has become popular over the years. What the method is designed 
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to do is to model human judgement by scoring the composition of the 
topics based on how coherent, i.e., interpretable, they are (Röder et al. 
2015). If the coherence score increases as the number of topics increases, 
for example, that would suggest that the most interpretable model is 
the one that displays the highest coherence value before flattening out 
or dropping. Both techniques are widely used to determine the optimal 
number of topics; the truth is, however, that neither of these measures is 
ideal because what they actually score is the probability of observations 
and not their degree of semantic meaning (Chang et al. 2009). In a study 
by Chang et al. (2009) about topics’ interpretability, the authors noted 
that these traditional metrics do not in fact capture whether topics are 
interpretable or not as they optimise topic models for likelihood-based 
measures but, as clarified earlier (cfr. Sect. 4.5), ‘LDA does not look for the 
same patterns that people do’ (Hindle et al. 2015, 510). In the study, the 
authors therefore suggest practitioners to adopt a more critical assessment 
of the topics’ quality. 

In this chapter, I have discussed how the use of familiar notions to name 
computational techniques such as topic modelling, sentiment analysis and 
machine learning has increased their popularity while creating epistemo- 
logical expectations that these methods will disappoint. Especially when 
used outside of their field of origin, the generated confusion contributes to 
obfuscate the mathematical assumptions upon which these techniques are 
built, such as the fundamental difference between discrete vs continuous 
modelling of information and the stemming consequences. In the context 
of digital knowledge creation and in relation to the big data philosophy, I 
reflected on the significant, yet often overlooked, implications for notions 
of causality and correlations. I then applied these considerations to describe 
the third use case of the book, analysis of a digital object, and used the 
properties and assumptions of topic modelling as the case example of a 
widely used computational technique that treats a collection of texts as 
discrete data. I have shown how the post-authentic framework can be used 
as the applied theory to engage critically with topic modelling by devoting 
special attention to the aspects of the analysis that are key for maintaining a 
symbiotic connection with the sources: pre-processing, corpus preparation 
and the number of topics. Specifically, I have shown how the application 
of the post-authentic framework to topic modelling acknowledges the 
technique at core correct but problematic and therefore in need of critical 
engagement. 
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My intention is not to dismiss topic modelling as woefully inadequate, 
but rather to encourage the integration of the method with critical scrutiny 
in order to address its limitations. In so doing, I have argued that by 
introducing a counter-narrative in the main scientistic discourse, the post- 
authentic framework strains the current system and can help us refigure 
a novel and more honest model for knowledge production in the digital. 
For example, when topic modelling is used for humanistic enquiry such 
as the analysis of cultural heritage material as discussed here, the post- 
authentic framework serves as a warning that the technique’s limitations 
are particularly significant and their impact on the provided interpretation 
of the past is problematic. I will return to these points in the next chapter 
in which I discuss the fourth and last use case of the book, visualisation 
of a digital object. Specifically, I will show how I have applied the post- 
authentic framework to prototyping a UI for topic modelling. I will insist 
on key aspects that aim to promote the active and reflective participation 
of the researcher in the process of digital knowledge production; I will 
devote particular attention to the added value of building UI elements 
that contribute to the urgent need for the establishment of critical data and 
visualisation literacy, especially when computational methods are adopted 
in fields outside of their original design. 


NOTES 


1. “Bloquer le pays ne permet pas d’endiguer l’épidémie”. 

2. For a detailed and in-depth historical discussion on causality in physics and 
philosophy, I refer the reader to Weinert (2005). 

3. Please note that not everyone agrees with this view and that there are still 
unanswered questions around causality, particularly in relation to discrete 
phenomena in quantum mechanics. See, for instance, Le Bellac (2006) and 
Jaeger (2009). 

4. A well-known phrase that synthesises this fact is ‘correlation does not mean 
causation’. 

5. Not previously annotated material. 
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CHAPTER 5 


What the Graph 


Figures don’t lie, but liars do figure. (Attributed to Carroll D. Wright, 1889) 


5.1 POKER (INTER)FACES 


Data visualisation and information visualisation are commonly used as 
synonyms but it has been argued that they in fact mean different things 
(Spence 2014; Falkowitz 2019; Ware 2021). The main difference would lie 
in the basic distinction between data and information in computer science: 
data is understood as raw materials (e.g., numbers), that is, the input, and 
believed to not carry any specific meaning per se, whereas information is 
the output, i.e., the meaning carried by a set of data. Thus, following this 
definition, information visualisation is understood as a cognitive activity 
(Spence 2014, 2), the process of discovering the meaning associated with 
a set of data, whereas data visualisation is the process of exploring data that 
may or may not uncover meaning, i.e., result in information visualisation. 
Another way to look at it is to consider the purpose of these two activities; 
data visualisation would essentially be a heuristic activity, whereas the 
main goal of information visualisation would be to influence a decision- 
making process (Falkowitz 2019). The two types of visualisations would 
accordingly translate into distinct products: data visualisations would allow 
several levels of interactions (e.g., filtering, zooming, selecting, aggregat- 


© The Author(s) 2023 107 
L. Viola, The Humanities in the Digital: Beyond Critical Digital 
Humanities, https://doi.org/10.1007/978-3-031-16950-2_5 


108 L. VIOLA 


ing), whereas information visualisations would simply show one or limited 
viewpoints while obscuring other perspectives more or less deliberately. 
Thus, according to this logic, only data that function as cognitive tools 
become information and therefore not all data is information. 

The post-authentic framework that I advance in this book argues against 
binary conceptualisations that misleadingly suggest and continue to perpet- 
uate the artificial notion of ‘raw data’, as if data could naturally pre-exist in 
a pristine, untouched environment, as if all the steps preceding the visual- 
isation, for example selection, collection, compilation, categorisation and 
storage, would not already be acts of interpretation and creation (Manovich 
2002; Gitelman 2013; Drucker 2020). The post-authentic framework 
therefore transcends the distinction between data and information and 
between data visualisation and information visualisation; it acknowledges 
that data always embed the interpretative dimensions that have originated 
it. It also recognises that not just the processes of data creation but equally 
the very tools and methods adopted for creating data are equally situated, 
limited and partial. Actions, tools, algorithms, platforms, infrastructures 
and methods are never neutral because they themselves stem from systems 
that are in turn situated and therefore already interpreted. Hence, whether 
the intent is to explore data or to persuade through data, the post-authentic 
framework to visualisation advocates transparency in the way the data is 
created and conclusions are drawn. In light of the considerations reasoned 
in the previous chapters, I will therefore use these terms interchangeably 
to signal that we need to move beyond the distinction between data and 
information and consequently between data visualisation and information 
visualisation because data is always produced to various degrees. 

Historically, innovations in data visualisation have originated from 
concrete, often practical goals (Friendly 2008, 30) so it is no surprise 
that the explosion of data of the last two decades and the subsequent 
need to analyse it and interpret it paired with advances in technology and 
statistical theory have greatly impacted the field. Indeed, as it is praised for 
its capacity to promptly display emerging properties in the data as well as to 
enhance access, visualisation has increasingly become an integral part of the 
digital. For example, using information visualisation to better understand 
the complex, internal processes according to which ML models elaborate 
data and provide results has been shown to offer insights that may lead 
to more transparency and increased trustworthiness in ML outputs and it 
has therefore become very popular in recent years (Chatzimparmpas et al. 
2020). 
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Visualisation has also gained a significant role in the context of analytical 
methods, including topic modelling. Studies have argued that graphic 
display tools are valuable not only for understanding the models’ results 
but, because similarity measures and human interpretation are partially 
misaligned (cfr. Chap. 4), also for a general assessment of whether topic 
modelling is at all a suitable technique for AI and cognitive modelling 
applications (Murdock and Allen 2015, 4284). Several possible visualisa- 
tion solutions have therefore over the years been proposed towards solving 
some of the already discussed challenges around topic modelling. These 
can be roughly divided into two research directions: the use of visualisation 
to improve the interpretation of the results and, stemming from the first 
one, the use of visualisation to improve the results themselves. Solutions 
in the first category try to enhance topics’ interpretability by visualising 
the results in a variety of ways using different statistical measures. Termite 
(Chuang et al. 2012), for example, allows terms’ comparison within and 
across topics using saliency measures based on the concept of weight 
(cfr. Chap. 4), but it does not allow for document interactivity. Chaney 
and Blei (2012) propose a web-based interface to allow nontechnical 
users to navigate the output of a topic model but it is not possible to 
draw comparisons of the topics’ distribution across documents. Topic Nets 
(Gretarsson et al. 2012) visualises the relations between a set of documents 
(or parts of documents) and their discovered topics in the form of an 
interactive network-type graph (i.e., nodes and edges), but it does not 
show topic or document composition. LDAvis (Sievert and Shirley 2014) 
visualises terms within a topic according to weighted topic-word and topic- 
topic relationships but the connection with the documents is lost. Finally, 
Topic Explorer (Murdock and Allen 2015) builds on LDAvis by visualising 
topic-document and document-document relationships as well as topic 
distribution and document composition. 

Studies in the second category allow users to interact with the models 
through a variety of human-in-the-loop! (HINTL) methods. For example, 
iVisClustering (Lee et al. 2012) allows users to manually create or remove 
topics, merge or split topics and reassign documents to another topic while 
visualising topic-document associations in a scatter plot. Using ITM— 
Interactive Topic Modelling (Hu et al. 2014)—users can add, emphasise or 
ignore words within topics, whereas with UTOPIAN (Jaegul Choo et al. 
2013), users can adjust the weights of words within topics, merge and split 
topics and generate new topics. Hoque and Carenini (2016) and Cai and 
Sun (2018) also propose visual methods to curate the topics by adding 
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or removing terms within a topic, adjusting the weight, merging similar 
topics or splitting mixed ones, manually validating the results and finally 
generating new topics on the fly. 

As this brief literature review shows, one of the main challenges of topic 
modelling, namely interpreting the results, has so far been tackled from 
a problem-solving point of view for which the main task is essentially 
to best exploit the model’s identified document structure (Blei 2012). 
Significantly, what all these studies have in common is the implementation 
of visualisation techniques exclusively in the final stage of a topic modelling 
workflow, that is, either to interpret the algorithm’s output or when 
training the algorithm itself. What these visualisation interfaces clearly show 
is the persistent conceptual disconnection between the results and the pro- 
cesses that generated them, the common belief that only interventions on 
the algorithm or on the final output are worthy of study and examination 
and so interventions on the sources-data are dismissed as not immediately 
relevant. As I have argued in Chap. 3, these processes of manipulations are 
often seen as ‘standard’, unproblematic and inconsequential rather than 
as heavy interventions on the sources and therefore on the results. The 
post-authentic framework that I propose in this book, on the contrary, 
strives to preserve and maintain the connection between the analyst and the 
digital object and it opposes any naïve conceptualisation of digital objects 
as finished, fixed, unproblematic entities. The post-authentic framework 
ultimately sees the human-digital object relationship as an essential compo- 
nent of the process of knowledge production in the digital. When applied 
to UI, the post-authentic framework is therefore not only mindful of such 
connection but it in fact encourages the scholar to be critically aware of it. 
My efforts towards building a post-authentic interface for topic modelling 
that I present here are therefore guided by this intention to enable users 
to actively engage with their digital sources and take ownership of their 
interventions but also to self-reflect and critique on those, thus openly 
acknowledging the interpretative dimension of the digital research process. 

The endless flow of digitised material and the need to store it, access 
it and analyse it has impacted the role of visualisation also in those fields 
that traditionally relied on material sources, for instance, cultural heritage, 
history, linguistics and more widely the humanities. With specific reference 
to cultural heritage, for instance, institutions have over the years resorted 
more and more to visual means—typically web-based interfaces—as a way 
to enhance access to cultural collections for users’ appreciation as well as 
for research purposes (Windhager et al. 2019a). In a survey of information 
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visualisation approaches to digital cultural heritage collections from 2014 
to 2017, for example, Windhager et al. (ibid.) found out that visualisations 
of digital cultural heritage material have steadily increased, peaking in 
2015. At the same time, however, these authors also highlighted that 
the seventy visualisation systems, prototypes and platforms they surveyed 
were sharing ‘overly narrow task- and deficiency-driven approaches to 
interface design that are grounded in a simplistic user-as-consumer- and 
problem solver-model’ (ibid., 13). Drucker (2013; 2014; 2020) has also 
long argued that graphical displays in the humanities often display a 
function- and task-driven UI design and generally lack a critical stance 
towards visualisation, evidencing IR intentions rather than the elicitation 
of curiosity, thoughtful engagement and reflection. 

The post-authentic framework that I advance in this book aims to 
contribute to the urgent need for the establishment of critical data lit- 
eracy, including visualisation literacy. It conceptualises digital objects as 
unfinished, situated processes, and it acknowledges the limitations, biases 
and incompleteness of tools and methods adopted for the analysis and 
visual representation of digital content. It provides helpful concepts for 
a re-theorisation of the process of digital knowledge creation, including 
the implementation of re-devised practices which are also acknowledged 
as always being adapted, unfixed, unfinished, arranged and interpreted. 
Applied to visualisations and interfaces, it acknowledges them as problem- 
atic endeavours that embed a wide net of situated processes, and it caters for 
their novel conceptualisation as epistemic objects which themselves carry 
meanings and therefore bear consequences. 

Post-authentic graphical displays counter what I call poker interfaces, 
attractive visualisations and sleek interfaces that tend to present infor- 
mation as detached from any subjectivity or which obscure or even 
break the connection with the digital object and the multiple layers 
of manipulation. In this chapter, I discuss two examples of how the 
post-authentic framework can be applied to visualisations; in Sect. 5.2, 
I examine prototypical work for designing a topic modelling interface 
whereas in Sect. 5.3, I present the design choices we took whilst developing 
DeXTER, the interactive visualisation app to explore enriched cultural 
heritage material currently loaded with ChroniclItaly 3.0 (cfr. Sect. 2.4). 
My discussion will specifically revolve around the challenges of promoting 
symbiotic exchanges when engaging with software especially focusing 
on the efforts we took to expose—rather than hiding—the ambiguities 
and uncertainties of NA and SA. I end the chapter by acknowledging 
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digital visualisation as fundamentally a curatorial operation which requires 
countless of subjective decisions that intervene on the digital object with 
several layers of manipulation; the post-authentic framework to graphical 
display, I conclude, can guide the encoding of such processes in the 
visualisation. 


5.2 VISUALISATION OF DIGITAL OBJECTS: TOWARDS 
A POST-AUTHENTIC USER INTERFACE FOR TOPIC 
MODELLING 


The development of a post-authentic interface for topic modelling should 
be understood in the context of the wider project Digital History Advanced 
Research Projects Accelerator (DHARPA),” within which software for DH 
research is currently being developed. Originally conceived by Sean Takats, 
the DHARPA project today is a team of developers and academics who 
continuously contribute to each other’s expertise by sharing knowledge 
and practises from a range of disciplines (computer programming, data 
engineering, data visualisation, linguistics, geography and various strains 
of history) (Cunningham et al 2022). Like DeXTER, DHARPA is hosted 
at the C?DH (cfr. Chap. 2). At the heart of DHARPA is encoding criticism, 
the effort of advocating the active and reflexive participation of the scholar 
in the process of digital knowledge production (Viola et al. 2021). Digital 
tools and techniques have been harshly criticised for alienating humanities 
scholars from their sources (ibid.) (cfr. Chap. 1), a bond regarded as crucial 
for the pursuit of scholarly enquiry; the driving rationale of DHARPA is 
that through critical assessment, contextualisation and documentation of 
digital methodologies—which are understood as partial and situated—such 
relationship can on the contrary be fortified and expanded. With this aim, 
DHARPA is developing software that operationalise critical epistemology 
by placing the scholar-source relationship at its centre. The efforts towards 
building a post-authentic interface for topic modelling that I present 
here are therefore guided by the very same intention to enable users 
to actively engage with the digital object and take ownership of their 
interventions. Moreover, through the post-authentic lens, my aim is to 
openly acknowledge the interpretative dimension of the digital research 
process and thus to embed self-reflection and critique into software’s both 
back-end and front-end. The confluence of the post-authentic framework, 
DeXTER, DHARPA and the C?DH is a perfect example of how the 
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notions of symbiosis and mutualism can guide the process of knowledge 
creation in the digital. 

The post-authentic framework opposes any conceptualisation of digital 
objects as something disconnected from the material sources; when applied 
to UL, it is therefore oriented towards safeguarding such connection and 
encouraging the scholar to be critically aware of it. The example of the 
NLP software MALLET (McCallum 2002) illustrates a case in which this 
connection is obscured. MALLET is a widely used ML tool for a range 
of NLP tasks such as document classification, clustering, topic modelling, 
information extraction and others. During the steps of data preparation for 
topic modelling (cfr. Sect. 4.5), for example, the analyst is never prompted 
to view the results of their interventions and overall, there is little chance 
of interacting with the digital object. This does not intrinsically mean that 
any topic modelling analysis based on MALLET is to be discarded, but it 
does mean that a distance is imposed between the sources and the analyst. 
I argue that it is this distance that inevitably causes disconnection and 
increases the risk to attribute meaning to spurious patterns (cfr. Sect. 4.5.3). 
Indeed, to ensure that the identified patterns carry actual significance, con- 
siderable efforts need to be subsequently directed towards regaining this 
connection, sometimes in the form of novel analytical methodologies such 
as the discourse-driven topic modelling approach (DDTM) we developed 
within OcEx (cfr. Sect.2.4) (Viola and Verheul 2019b). This approach 
integrates topic modelling with the discourse-historical approach (DHA) 
(Reisigl and Wodak 2001), an applied method of critical discourse analysis 
theory (van Dijk 1993) which triangulates linguistic, social and historical 
data to understand language use in its full socio-historical context and as a 
reflection of its cultural values and political ideologies (Viola and Verheul 
2019b). The integration of DHA into topic modelling is particularly useful 
for tasks such as topic interpretation and labelling, thus reducing the risk 
of attributing meaning to spurious patterns. 

Applied to interface design, the post-authentic framework strives 
to avoid the human-digital object disconnection by prompting critical 
engagement with the specificity of the source. Taking once again the 
example of ChroniclItaly 3.0, the post-authentic framework devotes careful 
attention to never lose contact with the information embedded in the 
filenames themselves. Based on the Library of Congress cataloguing 
schema, the filenames carry valuable metadata information including 
the reference code of the newspapers’ titles, the page number and the 
publication date of each issue (Viola and Fiscarelli 2021a). The reason 
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why it is so very important to critically engage with this information is 
once more due to the specificity of the source. Immigrant newspapers 
were constantly on the verge of bankruptcy which caused titles to be 
often discontinued; for the same reason, some newspapers could afford 
to publish biweekly or even daily issues, while others could only publish 
intermittently (Viola and Verheul 2019a,b). This is naturally reflected in 
the composition of the collection; newspapers like L’Italia—one of the 
most mainstream Italian immigrant publications in the United States at 
the time—and Cronaca Sovversiva, the most important anarchic Italian 
American newspaper managed to continuously publish for years, whilst 
others like La Rassegna or La Sentinella del West which came into being 
as small, personal projects of their funders could only survive for a few 
months. Although across the entire period of coverage, on the whole 
the collection holds a fair balance between the number of issues, the 
type of newspaper, the geographical location, the time span and political 
orientation of each title, the exploration of the collection’s metadata 
highlights factors such as over- or under-representation of some titles 
either on the whole or at specific points in time. Figure 5.1 displays how 
the issues are diversely distributed throughout the collection. 

The application of the post-authentic framework to digital objects 
recognises that factors like the heterogeneity of the digital object may 
result in potential polarisation of topics and points of view; it therefore 
maintains a connection with the digital object by facilitating access to 
such information and allowing the researcher to engage critically with 
it. By embedding the option to explore the metadata information (if 
present), the post-authentic framework signals the acknowledgement of 
the continuous underlying structure of a digital object (cfr. Sect. 4.2) 
hidden by its digital transformation into discrete form, i.e., sequences of Os 
and Is. It is indeed this acknowledgment that allows the analyst to obtain 
a fuller understanding of the object itself, in turn facilitating fundamental 
tasks such as adjusting the research question, resizing expectations and 
making sense of the results. 

This sustained connection with the materiality of the source has imme- 
diate relevance for computational techniques such as topic modelling. As 
discussed in Sect.4.4, the LDA algorithm assumes that a fixed number 
of topics is represented in different proportions in æl the documents; 
this is clearly a rather artificial and unrealistic assumption as it is highly 
unlikely that one fixed—and to some extent arbitrary—number of topics 
could adequately represent the content of all the ingested documents. 
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Fig. 5.1 Distribution ofissues within ChroniclItaly 3.0 per title. Red lines indicate 
at least one issue in a three-month period. Figure taken from Viola and Fiscarelli 
(2021b) 


Allowing the analyst to know that the material for the digital analysis is 
distributed differently acts as a way to highlight that problematic aspects 
of digital research and digital objects that precede the analysis itself but 
which nevertheless influence how the technique may be applied and the 
results interpreted. Figure 5.2 shows how this step could be handled in 
the interface. Once the documents are uploaded, the analyst is prompted 
by a question asking them about the potential presence of metadata infor- 
mation. With this question the intention is to maintain contact with the 
continuous aspect of the digital object hidden by its discrete representation 
and further altered by the topic modelling algorithm which treats the 
documents, too as a collection of discrete data. 

If the analyst chooses ‘yes’, the metadata information would then be 
used to create a dynamic, interactive visualisation inspired by the one 
displayed in Fig.5.1; this would display how the files are distributed in 
the collection, ultimately creating room for reflection and awareness. In 
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Fig. 5.2 Wireframe of a post-authentic interface for topic modelling: sources 
upload. The wireframe displays how the post-authentic framework to metadata 
information could guide the development of an interface. Wireframe by the author 
and Mariella de Crouy Chanel 


the case of ChroniclItaly 3.0, for example, this visualisation displays the 
number of published issues on a specific day, month or year and by 
which titles; the display of this information allows the analyst to promptly 
identify the difference in the frequency rate of publication across titles and 
potential gaps in the collection (Fig. 5.3). The post-authentic framework 
to visualisation signals the importance of maintaining the connection with 
the digital object, understood as an organic, problematic entity. Such 
connection is acknowledged as an essential element of the process of 
knowledge creation in the digital in that it favours a more engaged, critical 
approach to digital objects and it creates a space in which more informed 
decisions can be made and ultimately answering the need for digital data 
and visualisation literacy. 
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Fig. 5.3 Post-authentic framework to sources metadata information display. 
Interactive visualisation available at https: //observablehq.com/@dharpa- project / 
timestamped-corpus. Visualisation by the author and Mariella de Crouy Chanel 


The post-authentic framework to interface design aims to make the link 
between the analyst, the digital object’s discretised continuous information 
and the methods employed to manage it, analyse it and visualise it explicit 
at each stage of the digital knowledge creation process. Informed by 
these motivations, an interface for topic modelling would facilitate close 
engagement, for instance by allowing users to create and preview subsets of 
the digital object (e.g., through filtering cfr. Sect. 4.5.2) for further explo- 
ration or to test hypotheses on a sample. In this way, the post-authentic 
framework signals the rejection of objectivist and positivist understandings 
of digital processes which depict data as pre-existing and somewhat fixed. 
The interface, on the contrary, would adopt a constructivist principle which 
exposes the management of data as a problematic enterprise, a subjective 
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act made of constant interpretation, manipulation and decisions which 
transform, select, aggregate and ultimately create data (Drucker 2011). 
Following these principles, the wireframe in Fig. 5.4 displays how sources’ 
preview could be handled in the interface. 

Research that adopts computational techniques rarely acknowledges the 
influential role of tools, infrastructures, software, categories, models and 
algorithms on the research process or the results, as these are typically 
reputed to be neutral. The researcher or curator often provides little or no 
documentation of the decisions and the mechanisms that transformed their 
sources into data (Viola and Fiscarelli 2021b). Through the chapters of this 
book, however, I have demonstrated that transformative operations such as 
those directed at the creation, enrichment, digital analysis and visualisation 
of a digital object involve an intricate network of complex interactions 
between countless elements and factors including the materiality of the 
sources, the digital object and the analyst as well as between the operations 
themselves. Although often presented as more or less ‘standard’, these 
operations on the contrary need to be problematised and tackled criti- 
cally. The post-authentic framework to knowledge creation in the digital 
acknowledges them as limited and situated, and it prompts a fundamental 
rethink of how these operations impact the sources and produce a digital 
object; this challenge, I maintain, can be met by maintaining engaged 
contact with the digital object. For problematic operations such as pre- 
processing, stemming and lemmatising (cfr. Sect. 4.5.2), this connection 
can be sustained by prompting engagement, for instance by making 
processes readily visible and intelligible to the analyst. The wireframes 
in Figs.5.5 and 5.6 show how these operations would be handled in 
the interface. An expandable tool-tip asking ‘What is pre-processing?’ 
together with z buttons located next to each operation would give users the 
possibility to access detailed explanations of the available operations—often 
grouped under opaque labels such as ‘data cleaning’—to better understand 
the assumptions behind them. The UI would also allow data preview, thus 
making the impact of each intervention visible and accessible to the analyst. 
These features would create room for more conscious decisions and, at the 
same time, they would signal that data is always made. 

The post-authentic framework calls upon the scholar’s critical and 
active engagement in the process of knowledge creation in the digital 
and raises awareness of the limitations, biases and incompleteness of tools 
and methods; applied to interface design it can therefore contribute to 
the establishment of critical data management and visualisation literacy. 
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Fig. 5.5 Interface for topic modelling: data pre-processing. The wireframe dis- 
plays how the post-authentic framework to UI could make pre-processing more 
transparent to users. Wireframe by the author and Mariella de Crouy Chanel 


In the interface, this would be achieved by entering into a dialogue 
with the researcher, for instance, by asking the question ‘What is corpus 
preparation?’ (Fig.5.7); the combination of expandable tool-tips and 7 
buttons next to each operation would serve the dual purpose of making 
the process of data creation more intelligible to users while maintaining 
the connection with the digital object. Indeed, more transparent processes 
enable a more conscious participation of the scholar in the fluid exchanges 
between computational and human processes which are understood as part 
of a wider, complex system of interactions. The post-authentic framework 
attempts to reach symbiosis and mutualism (cfr. Sect. 2.2) by making these 
exchanges explicit as opposed to a passive and dissociated fruition of such 
interactions. To the same aim, the output resulting from implementing 
the different methods for corpus preparation would be saved each time 
(left panel in Fig. 5.7) so that users could experiment with various methods 
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Fig. 5.6 Interface for topic modelling: data pre-processing (stemming and lem- 
matising). The wireframe displays how the post-authentic framework to UI could 
make stemming and lemmatising more transparent to users. Wireframe by the 
author and Mariella de Crouy Chanel 


and settings, compare results and make more informed decisions. In this 
way, the interface would actualise a counterbalancing narrative in the main 
positivist discourse that equals the removal of the human—which in any 
case is illusory—to the removal of biases. To the contrary, the argument 
I advance in this book is that it is only through the active and conscious 
participation of the human in processes of data creation, tools’ selection, 
methods’ and algorithms’ implementation that such biases can in fact be 
identified, acknowledged and to an extent, addressed. 

The post-authentic framework to knowledge creation in the digital 
advocates a more participatory, critical approach towards digital methods 
and tools, particularly if they are applied for humanistic enquiry. Against a 
purely correlations-driven big data approach, it offers a more complex and 
nuanced perspective that challenges current views sidelining human agency 
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Fig. 5.7 Interface for topic modelling: corpus preparation. The wireframe dis- 
plays how the post-authentic framework to UI could make corpus preparation more 
transparent to users. Wireframe by the author and Mariella de Crouy Chanel 


and criticality in favour of patterns and correlations. Applied to meth- 
ods such as topic modelling, for instance, the post-authentic framework 
highlights the assumptions behind the technique, such as discreteness, a- 
causality, randomness and text disappearance. Whilst exploiting the new 
opportunities offered by computational technologies, it rejects a passive 
adoption of these methods, and it highlights the intrinsic dynamic, situated, 
interpreted and partial nature of the digital in contrast with the main 
discourse that still presents techniques and outputs as exact, final, objective 
and true. Applied to UI, it also provides helpful concepts for both its theo- 
risation and the implementation of re-devised visualisation practices which 
are also acknowledged as being adapted, unfixed, unfinished, arranged and 
interpreted. 
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5.3 DEXTER: A POST-AUTHENTIC APPROACH TO 
NETWORK AND SENTIMENT VISUALISATION 


In the context of visualisation, questions of criticality, transparency, trust 
and accountability have increasingly become part of the scientific dis- 
course (see for instance Gaver et al. 2003; Drucker 2011, 2013, 2014, 
2020; Glinka et al. 2015; Sanchez et al. 2019; Windhager et al. 2019a; 
Boyd Davis et al. 2021) and several recommendations for operationalising 
critical digital literacy in visual design have been suggested. For example, 
the interpretative and evaluative value of ambiguity for design has been 
praised by Gaver et al. (2003); Drucker (2020) has proposed a framework 
for visualisations that promotes plurality, critical engagement and data 
transparency; Windhager et al. (2019a) have suggested design guidelines 
that also promote contingency (i.e., acknowledging the incompleteness of 
user experience) and empowerment (i.e., encouraging user’s self-activation 
and engagement) (141), and Sanchez et al. (2019) have offered a frame- 
work for managing uncertainty in DH visualisations. Despite an increased 
awareness, however, research in this area points out how intrinsic aspects 
of knowledge creation such as ambiguity, uncertainty and errors are 
still largely hidden from view and how instead the majority of graphical 
displays tend to be sleek visualisations that convey exactness, neutrality and 
assertiveness, i.e., poker interfaces. 

The post-authentic framework that this book suggests incorporates all 
these recent perspectives; however, as it refers to the realm of digital 
knowledge that is created daily, at the same time, it goes beyond them. 
With specific reference to visualisations, the post-authentic framework 
endorses ambiguity, uncertainty and transparency; it acknowledges the 
incompleteness and partiality of data, tools and methods and rather than 
mudding it, it exposes their potential untrustworthiness. It is thanks to 
this awareness, I maintain, that the post-authentic framework contributes 
to maintain the process of knowledge creation in the digital honest and 
accountable, both for present and future generations. The visualisations for 
NA and SA in the DeXTER app that I present here are a good example of 
how the post-authentic framework can actualise these aims when visualising 
a digital object. 

The DeXTER project is a post-authentic research activity which com- 
bines the creation of an enrichment workflow with a meta-reflection 
on the workflow itself as well as the creation of an interactive app to 
visualise enriched digital heritage collections. This means that the main 
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intention guiding its design is to provoke independent assessment (Gaver 
et al. 2003), to expose inconsistencies and cast doubts on the digital 
object and to create a space for interpretation, rather than to provide 
one. This includes openly acknowledging that the implementation and 
potential value of the used methods are also inextricably intertwined with 
the specificity of the source as well as the research context of the related 
project. For example, when enriching ChroniclItaly 3.0, we used NA and 
SA to explore the several ways in which referential entities relate to each 
other in the collection; this included modelling their frequency of co- 
occurrence in a sentence and how this changes over time, the prevailing 
attitude towards such entities, and connections between entities at specific 
points in time (e.g., on the same day) across the different newspapers. 
These operations aimed to maximise the potential value of using referential 
entities as indicators of markers of identity (cfr. Chap. 3), that is, as a way to 
navigate the process of Italian Transatlantic migration as it was narrated by 
the different communities of Italian immigrants in the United States. Far 
from being standard, techniques and methods are therefore understood as 
adapted and chosen and their suitability in need of assessment rather than 
assumed to be intrinsically good (or bad). 

The post-authentic framework can inform the selection of methods by 
warning the analyst that techniques developed in other fields for specific 
aims and with specific assumptions are not necessarily compatible across 
different data types. For example, NA is a method that originates in 
mathematics and graph theory (Biggs et al. 1986), and although it has 
long been applied across disciplines and for different purposes, it is typically 
used to answer questions mostly pertaining to the social sciences. This is 
because the underlying assumption is that the discrete modelling of how 
actors (e.g., entities) relate to each other (i.e., edges) provides adequate 
explanations of social phenomena. For a detailed overview ofits application 
particularly in modern sociology, I refer the reader to Korom (2015). 

Due to its characteristic feature of schematically representing abstract 
and often ambiguous information, NA has recently become popular also 
in the humanities. In linguistics, for example, NA has been applied to large 
textual corpora of naturally occurring language to analyse the relationship 
between language and identity in multilingual communities (Lanza and 
Svendsen 2007) or to explore complex syntactic and lexical patterns as 
networks, for example in language acquisition or language development 
studies (Barceló-Coblijn et al. 2017). It has also been argued that NA 
could be integrated in sociolinguistics as a way to provide insights into the 
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relationship between the use of linguistic forms and culture (Diehl 2019). 
In branches of DH such as digital history and digital cultural heritage, 
NA is also considered to be an efficient method to intuitively reduce 
complexity (Diiring et al. 2015). This may be due to the fact that this 
technique benefits particularly from attractive visualisations which support 
the impression that explanations for social events are accurate, complete, 
detailed and scientific, naturally adding to the allure of using it. 

However, a typically omitted, yet rather critical issue of NA is that the 
graphs can only display the nodes and attributes that are modelled; as 
these stem from samples which by definition are incomplete and which 
undergo several layers of manipulation, transformation and selection, the 
conclusions the graphs suggest will always be partial and potentially based 
on over-represented actors or conversely, on underrepresented social cate- 
gories. In the case of a digital object such as the cultural heritage collection 
ChroniclItaly 3.0 which aggregates sources heterogeneously distributed 
(cfr. Sect. 5.2), this issue is particularly significant as any resulting graph 
depends on the modelled newspaper (e.g., mainstream vs anarchic), on the 
type and number of entities included and excluded and on the attributes’ 
variables (e.g., frequency of co-occurrence, number of relations, sentiment 
polarity), to name but a few. Each one of these factors can dramatically 
influence the network displays and consequently impact on the provided 
interpretation of the past. 

The project’s GitHub repository’—which is to be understood as inte- 
gral part of the visualisation interface—is a good example of how the 
post-authentic framework can guide the actualisation of principles of trans- 
parency, accountability and reproducibility and how it values ambiguity 
and uncertainty. The DeXTER’s GitHub repository documents, explains 
and motivates all the interventions on the data, including reporting on the 
processes of entity selection (cfr. Sect. 3.3). The aim is to warn the analyst 
that despite being (too) often presented as a statement of fact, a visually 
displayed network is a mediated and heavily processed representation of the 
modelled actors. As such, the post-authentic framework does not solely 
aim to increase trust in the data and how it is transformed, but also 
to acknowledge uncertainty in both the data lifecycle and the resulting 
graphs and finally to expose and accept how these may be untrustworthy 
(Boyd Davis et al. 2021, 546). The act of making explicit the interpretative 
work of shaping the data is what Drucker calls ‘exposing the enunciative 
workings’ (2020, 149): 
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For data production, the task is to expose some of the procedures and steps 
by which data is created, selected, cleaned, and processed. Retracing the 
statistical processes, showing the datamodel and what has been eliminated, 
averaged, reduced, and changed in the course of the lifecycle would put the 
values of the data into a relative, rather than declarative, mode. This is one of 
the points of connection with the interface system and task of exposing the 
enunciative workings. 


By acknowledging that the displayed entities are not a// the entities in 
the collection but in fact a representative, yet small, selection, DeXTER 
encourages close engagement with the NA graphs; it does not try to 
remove uncertainty but it points where it is. At the same time, it recognises 
the management of data as an act of constant creation, rather than a mere 
observation of neutral phenomena. For example, the process of entity 
selection as I described it in Sect. 3.4 created a subset of the most fre- 
quently occurring entities distributed proportionately across the different 
newspapers. With this intervention, we aimed to alleviate the issue of source 
over-representation due to some titles being much larger than others 
and to reduce complexity in the resulting network graphs, notoriously 
considered as the downside of NA. At the same time, however, this 
intervention may cause the least occurring entities to be under-represented 
in the visualisations. Thus, the transparent and detailed documentation 
of how we intervened on the data that originates the NA visualisations 
counterbalances the illusion of neutrality and completeness often conveyed 
by ultra-polished NA visualisations. 

Another issue of NA data modelling concerns the theoretical assumption 
upon which the technique is based. As a bare minimum, a network visual- 
isation connects nodes through a line (i.e., edge) that carries information 
on the type of relation between the nodes (i.e., attributes). Nodes are 
understood as discrete objects, i.e., completely independent from each 
other (cfr. Chap. 4); this ultimately means that the nodes are modelled 
to remain stable and that the emphasis is on the relations, as these are 
believed to provide adequate explanations of social phenomena. However, 
this type of modelling arguably paints a rather artificial picture of both 
the phenomena and the actors who remain unaffected by the changing 
relationships between them. To put it in Drucker’s words: 


This is a highly mechanistic characterization of nodes (and edges), whether 
they consist of human beings, institutions, or events which reduce[s] all 
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relationships to the same presentation and make[s] static representations out 
of dynamic conditions. (2020, 180) 


NA factually transforms continuous (i.e., inseparable) elements such as 
cultural actors into discrete and fixed points; this transformation is further 
modelled visually, giving the impression of a neutral, exact and observable 
description of their entanglement. The possibility to historicise actors and 
relations in DeXTER is a concrete example of how the post-authentic 
framework to NA aims to counteract this inevitably artificial ‘flattening 
effect’. When developing the DeXTER’s interface, we decided to model 
the data points displayed in the graphs according to several parameters 
and attributes that reflect a conceptualisation of networks as lively and 
dynamic structures. By sliding the time bar (cfr. Fig. 5.8), the analyst can, 
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Fig. 5.8 DeXTER default landing interface for NA. The red oval highlights the 
time bar (historicise feature) 
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for example, observe not just how the relationships between entities change 
over time but also the entities themselves. It is for instance possible to 
explore how entities of interest were mentioned by migrants over time: 
by selecting /deselecting specific titles (cfr. Fig.5.9) of different political 
orientation and geographical location, by selecting the frequency rate 
and sentiment polarity (cfr. Fig. 5.10) to observe the prevailing emotional 
attitude of the sentences in which the entities were mentioned together as 
well as their frequency of occurrence. 

By visualising both entities and relations and by creating dynamic and 
interactive NA visualisations, the DeXTER interface on the whole aims 
to provide several viewpoints on the same data, and it effectively shows 
how several dimensions of observance dramatically affect the graphical 
arrangements. In the case of the historicisation feature, for example, as the 
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Fig. 5.10 DeXTER default landing interface for NA. The red ovals highlight the 
frequency and sentiment polarity parameters 


data is modelled in reference to the documents’ timestamp, the analyst can 
swipe the time bar on the top left of the interface to explore the changing 
relationships between entities over time and/or at specific intervals. This 
adds a historical dimension to the networks and allows the analyst to 
observe and engage with changes in the graphs interactively as they reflect 
how the displayed entities were mentioned by migrants according to 
changing temporal parameters. We also added informative tool-tips next 
to each available option to encourage close engagement with the interface, 
with the process of data creation, with the method of NA itself and the 
meanings offered by these parameters (Gaver et al. 2003). 

The post-authentic framework conceptualises ambiguity and uncer- 
tainty as intrinsic elements of knowledge creation in the digital; thus, rather 
than rejecting them or obscuring them, it preserves them as opportunities 
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to reduce the reliance on potentially biased methods and to remind us on 
the whole of the illusion of certainty (Edmond 2019). Applied to NA, this 
means creating a space for interpretation, for instance by exposing the data 
multi-dimensional complexity (Windhager et al. 2019b; Drucker 2020). 
In the DeXTER interface, this was implemented by providing multi- 
perspectivity on the same nodes. DeXTER allows users to explore three 
types of networks: two entity-focused graphs (i.e., egocentric networks) 
and one issue-focused network. We decided to visualise the networks 
as egocentric networks for two reasons. Egocentric networks are local 
networks with one central node, known as the ego. This type of network 
visualises all the nodes directly connected to the ego, i.e., the alters. 
Crossley et al. (2015) suggest that one main advantage of egocentric 
networks is that they allow for rich visualisations even when all the entities 
in a data-set cannot be mapped because of the network’s large size, which is 
indeed the case of ChroniclItaly 3.0 as discussed in Chap. 3. Furthermore, 
the provided ego’s extensive information may offer a personal perspective 
on the node and the alters; indeed, thanks to this property, egocentric 
networks are often referred to as cognitive networks (Perry et al. 2018). 
We therefore chose egocentric network visualisations for their potential 
ability to provide relevant material for the study of migration as experienced 
and narrated by the migrants themselves. Starting from a selected entity 
of their choice, users can explore several parameters: the net of entities 
most frequently mentioned in the same sentence as the ego, the prevailing 
emotional attitude in those sentences, the number of times entities were 
mentioned together and the titles in which they were mentioned. This 
information is encoded and made available to the analyst both through 
pop-up tool-tips and through the colour of the edges (i.e., pastel blue 
for negative sentiment, white for neutral and pastel red for positive). 
Figure 5.11 shows the egocentric network for the GPE entity sicilia 
(Sicily) across all the titles of the collection as mentioned in sentences 
with prevailing positive sentiment. If ego-network is not selected, the 
graph additionally displays the relations among the alters. As shown in 
Fig. 5.12, the representation of relations can react significantly to the tiniest 
modification of parameters (Windhager et al. 2019b); even when the same 
node is selected, the overall offered perspective on the relational structure 
of the graph can change significantly. 

The third type of network visualisation (i.e., issue-focused network) 
allows the exploration of entities starting from a specific issue. Whereas in 
an egocentric network users observe a network which has an actor/entity 
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Fig. 5.11 DeXTER: egocentric network for the node sicilia across all titles in the 
collection in sentences with prevailing positive sentiment 


of their choice as the focal node, this third visualisation displays the actors 
mentioned in specific newspapers on specific days. In this way, the issue- 
focused network offers an additional perspective on the same digital object 
potentially contributing valuable insights for the analysis of how events and 
actors of interest were portrayed by migrants of different political affiliation 
and who were based in different parts of the United States. Thus, instead of 
offering one obvious meaning, DeXTER offers multiple perspectives, and 
by capturing heterogeneous contexts, it creates a tension that the analyst 
is encouraged to resolve through independent assessment (Gaver et al. 
2003). Figure 5.13 shows the default issue-focused network graph. 
DeXTER’s visualisation of sentiment as an attribute of NA is also guided 
by post-authentic principles. As already discussed in Sect. 3.4, SA is a com- 
putational technique that aims to identify the prevailing emotional attitude, 
i.e., the sentiment, in a given text (or portions of a text); the sentiment is 
then typically categorised according to three labels, i.e., positive, negative 
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Fig. 5.12 DeXTER: network for the ego sicilia and alters across titles in the 
collection in sentences with prevailing positive sentiment 


or neutral. A problematic aspect of the technique is that it presents these 
labels as unambiguous, universally accepted categories, providing a neutral 
and observable description of reality, and obscuring the highly problematic 
and interpretative quality of the very process of establishment of such 
categories (cfr. Sect. 3.4) (Puschmann and Powell 2018). The concept of 
‘sentiment score’ additionally reinforces the illusion of objectivity, and it 
further obfuscates the inherently vague, profoundly subjective dimension 
of emotions and their definitions, a process intrinsically open to multiple 
interpretations and subject to ambiguity. As a way to acknowledge the 
ambiguities of the assumptions behind the technique and of a ‘senti- 
ment score’, DeXTER’s graph colouring scheme is fluid and nuanced (as 
opposed to solid colours): the colour gradients go from a darker shade 
of blue for the lowest score (i.e., negative) to a darker shade of red for 
the highest score (i.e., positive). The DeXTER’s visual representation of 
sentiment results in a deliberately blurred graph, the borders of the edges 
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Fig. 5.13 DeXTER: default issue-focused network graph 


are purposely smudged and pale, and pastel shades are preferred over 
bright, solid shades; the aim is to openly acknowledge SA as ambiguous, 
situated and therefore open to interpretation, rather than precise, neutral 
and certain. By exposing these inconsistencies, post-authentic visualisations 
on the whole question the main positivist discourse around technology. We 
achieved this goal by providing a transparent documentation of how we 
identified the sentiment categories, how we aggregated the results, how 
we conducted the classification, how we interpreted the scores and how 
we rendered them in the visualisation, in the openly available dedicated 
GitHub repository which also includes the code, links to the original and 
processed material and the files documenting the manual interventions. 
Finally, guided by the post-authentic framework, DeXTER emphasises 
the continuous making and re-making of data; this process of forming, 
arranging and interpreting data is encoded within the interface itself. 
Through the tab ‘Data’, users can at any point access and download the 
data behind the visualisations as they reflect users’ selection of filters and 
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parameters (e.g., title, time interval, frequency, entity). The intention is 
to disrupt traditional notions that conceptualise data as fixed, unarguable 
and defined. At the same time, DeXTER acknowledges the collective 
responsibility of building a source of knowledge for current and future 
generations, and it frames the process of knowledge creation in the digital 
as accountable, unfinished and receptive to alternatives. 

Through the exploration of several case studies, i.e., the creation, 
enrichment, analysis and visualisation of a digital object, this book argues 
that new theoretical paradigms are now urgently required; these must be 
centred on a reconceptualisation of digital objects as epistemic objects 
which themselves carry meanings and which therefore alter the perception 
of knowledge created in a digital environment. With specific reference to 
visualisations, interfaces and graphic display, the post-authentic framework 
that I propose in this book acknowledges them as problematic endeav- 
ours embedding a wide net of situated processes which require more 
systematic and sophisticated criteria than over-simplistic user-as-consumer- 
and problem-solver-models (Windhager et al. 2019a). The recognition of 
such complexities accepts and in fact embraces digital knowledge creation 
practices as being embedded in extremely convoluted networks of countless 
factors at play which cannot be fully trusted nor predicted. The post- 
authentic framework therefore recognises the limitations and biases of 
specific tools and techniques and exposes problematic processes such 
as data creation, selection and manipulation by openly disclosing their 
complexities and lifecycle, by thoroughly documenting the decisions and 
actions and by allowing users to access the data behind the visualisations, 
including making the acts of transformation explicit. 

In the post-authentic interface DeXTER, we actualised this by providing 
a space for interpretation and individual assessment, by favouring multi- 
perspectivity through different types of network visualisations and by 
offering dynamic and interactive graphs. This also arguably alleviates the 
issue of displaying artificial pictures of social phenomena due to the tech- 
nique’s intrinsic properties for which actors remain stable and unaffected 
by the relations. While I am not implying that a post-authentic framework 
is the perfect approach to digital knowledge creation practices, I do argue 
that, by redefining our understanding of the theoretical dimensions of 
digital objects, tools, techniques, platforms, interfaces and infrastructures, 
especially for humanistic enquiry, the framework offers theoretical and 
methodological criteria that recognise the larger cultural relevance of digi- 
tal objects, and it provides an urgently needed architecture for issues such as 
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transparency, replicability, Open Access, sustainability, data manipulation, 
accountability and visual display. 


NOTES 


1. A human-in-the-loop method requires human and machine intelligence to 
create machine learning models. In this approach, humans interact with the 
algorithm during training by tuning and testing 

2. https://github.com/DHARPA- Project 

3. https: //github.com/lorellayv/DeXTER- DeepTextMiner 
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CHAPTER 6 


Conclusion 


Philosophers until now have only interpreted the world in various ways. 
The point, however, is to change it. (Karl Marx, 1846) 


As technology changes, society changes and so the way society produces 
knowledge and culture also changes. Yet, the predominant model of 
knowledge production continues to be one bound to the epistemology of 
last century’s industrial societies. In this book, I argued that to respond 
to the radical changes brought by the digital transformation of society 
and aggravated by the 2020 pandemic, the current model of knowledge 
creation must urgently be re-theorised. This means, I contended, pushing 
beyond mere observations of how higher education has been transitioning 
towards the digital and recognise that a more fundamental question needs 
to be asked. For example, it is no longer sufficient to reflect on how 
the digital transformation has required teachers to rapidly acquire digital 
skills to adapt and rethink their learning methods, or how the digital has 
affected branches of knowledge (e.g., humanities) or individual disciplines 
(e.g., history), or how differently academics now think about sharing their 
research findings (e.g., end-users) or how their research is increasingly 
dominated by data rather than by sources, including having to consider 
issues of storage, archival, transparency, etc. A different critical awareness 
is now required: the shift has been iz—as opposed to towards—the digital. 

Claiming that the shift has been in the digital acknowledges conclusively 
that the digital is now integral to not only society and its functioning, 
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but crucially also to how society produces knowledge and culture. My 
argument for a new model of knowledge production therefore starts 
from recognising that persisting binary modulations in relation to the 
digital—for example, between digital knowledge creation and non-digital 
knowledge creation—are no longer relevant in that they continue to 
suggest artificial, irrelevant divisions. Such divisions, I contended, not 
only slow down progress and hinder knowledge advancement, but by 
fragmenting expertise, they sustain a model of knowledge that does not 
adequately respond to a reality complexified by the digital. It has been 
the argument of this book that the digital transformation of society 
requires a more problematised understanding of the digital as an organic 
entity that brings multiple levels of complexity to reality, many of which 
have unpredictable consequences. Our traditional model of knowledge 
creation based on single discipline perspectives, hierarchical divisions and 
competition is no longer suited to meet the unprecedented challenges 
facing societies in the digital. 

In this book, I developed a new theoretical and methodological frame- 
work, the post-authentic framework, which critiques dominant positivistic 
and deterministic views of technology and computational methods and 
offers new terminologies, concepts and approaches in reference to the 
digital, digital objects and practices of knowledge production in the digital. 
The post-authentic framework breaks with dialectical principles of dualism 
and antagonism and with the rigid model of knowledge creation that 
divides knowledge into disciplines and disciplines into two areas: the 
sciences and the humanities. Dual notions of this kind, I argued, are 
complicit of an assiduously cultivated discourse that has historically exalted 
digital methods as exact, rigorous, neutral, more relevant and funding- 
worthy than critical approaches. This includes the cosy and reassuring 
myths that data is unarguable, bias-free, precise and reliable, as opposed 
to sources and human consciousness which have been more and more 
sidelined as carriers of biases, unreliability and inequality. 

My reframing of the digital through the post-authentic framework 
helps us recognise that the narrative simplification around computational 
techniques and consciousness sidelining cannot be afforded to continue 
because knowledge does not respect the limits of disciplines and the impli- 
cations of being in the digital transcend such artificial boundaries. This is a 
reality we can no longer ignore and which can only be confronted through a 
reconfigured model of knowledge creation that would reconceptualise it as 
happening in the digital. The world has entered a new dimension in which 
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higher education can no longer afford to opportunistically see technology 
and its production as instrumental and contextual to knowledge and 
teaching or simply as an object of critique, admiration, fear or envy. The 
post-authentic framework that I proposed in this book functions as a radical 
critique of such outdated conceptualisations of the digital and argues that 
the current model of knowledge creation with its established boundaries 
between disciplines and specialisations is not suited to respond to the 
complex challenges of a world in the digital. 

Instead, the framework advocates a notion of knowledge as fluid, in 
which differences are not rejected but welcomed according to the principles 
of symbiosis and mutualism (cfr. Sect.2.2). Symbiosis and mutualism 
oppose models of reality that support individualism and separateness as 
inevitably leading to conflict and competition; one such model of reality 
is the division of knowledge into monolithic disciplines. Borrowed from 
biology, the concept of symbiosis breaks with the current conceptualisation 
of knowledge as separate, linear and fragmented into multiple disciplines 
and that of the digital as a static, inconsequential entity. To the con- 
trary, symbiosis evokes ideas of close and long-term cooperation between 
different organisms and the continual renegotiation of interactions; past, 
present and future systems; power relations; infrastructures; interventions; 
curations and curators; programmers and developers. 

Mutualism opposes interspecific competition, that is, when organisms 
from different species compete for a resource, resulting in benefiting only 
one of the actors involved. I maintained that our model of knowledge 
creation based on hierarchical separations between disciplines resembles an 
interspecific competition dynamic as it has forced knowledge production 
to operate within a space of conflict and competition. This model, I con- 
tended, is outdated and inadequate, it traps curiosity into rigid categories, 
and it is unsuited to rethink and explain the transformative effect the digital 
is having on our culture and society; to use Virginia Eubanks’ words, 
it contributes to automate inequality and it can therefore make society 
worse. I therefore argued that any re-modulation still operating within the 
current disciplinary model of knowledge creation is no longer sufficient; 
to this end, I proposed the notions of symbiosis and mutualism to help 
us reconceptualise knowledge as fluid and inseparable. Symbiosis and 
mutualism shape a model in which curiosity is finally given the long overdue 
free rein, in which the different areas of knowledge do not compete against 
each other but benefit from a mutually compensating relationship. When 
asking ourselves the questions ‘How do we produce knowledge today?’ 
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and ‘How do we want our next generation of students to be trained?’, the 
concepts of symbiosis and mutualism may guide our answers. 

Symbiosis and mutualism are central notions also for the development 
of a more problematised conceptualisation of digital objects and digital 
knowledge production. The post-authentic framework re-examines the 
digital as situated and partial, an extremely convoluted assemblage of 
factors and actors, themselves part of wider networks of situated com- 
ponents, processes and mechanisms of interaction and the various forms 
of power embedded in computational processes and beyond. As such, 
far from being mere immaterial copies of the originals, digital objects 
are acknowledged as bearing consequences which transcend traditional 
questions of authenticity; digital objects are never finished nor they can be 
finished; countless versions can endlessly be created through processes that 
are shaped by past decisions and in turn shape the following ones. Thus, the 
post-authentic framework engages with both products and processes which 
are understood as never neutral, as incorporating external, situated systems 
of interpretation and management and therefore bearing consequences 
which go beyond the object-centred culture of authenticity. 

To exemplify this complexity of conflating humans, entities and pro- 
cesses and past, present and future experiences, I used ChroniclItaly 3.0, 
a digital cultural heritage collection of Italian American newspapers pub- 
lished between 1898 and 1936. Specifically, I examined and illustrated how 
the application of the post-authentic framework can inform the creation, 
enrichment, analysis and visualisation of a digital object. By redefining our 
understanding of both the conceptual and concrete dimensions of digital 
objects, tools and techniques, the post-authentic framework provides 
theoretical and methodological criteria that recognise the larger cultural 
relevance of digital objects and the methods to create them, analyse them 
and visualise them it affords an architecture for issues such as transparency, 
replicability, Open Access, sustainability, data manipulation, accountability 
and visual display. 

Central to the framework is the recognition that illusory, positivistic 
notions of the digital are ill-suited for the problems of the digital societies 
we live in. The post-authentic framework exposes aspects of knowledge 
creation in the digital that oppose both the mainstream fetishisisation 
of big data and algorithms and an unproblematised understanding of 
the digital, it addresses issues such as ambiguity and uncertainty, and 
the subjective and interpretative dimension of collecting, selecting, cat- 
egorising and aggregating, i.e., the act of creating data. In pursuing 
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my case for a novel model of knowledge creation in the digital, in the 
book, I presented a range of personal case studies and examined how the 
application of the framework in my own work helped me address aspects 
of knowledge creation in the digital such as transparency, documentation 
and reproducibility; questions about reliability, authenticity and biases; 
and engaging with sources through technology. Using ChroniclItaly 3.0 
as digital object, I applied the post-authentic framework to a variety of 
applied contexts such as digital heritage practices, digital linguistic injustice, 
critical digital literacy and critical digital visualisation and I devoted specific 
attention to four key aspects of knowledge creation in the digital: creation 
of a digital object in Chap. 2, enrichment of a digital object in Chap. 3, 
analysis of a digital object in Chap.4 and visualisation of a digital object 
in Chap.5. This auto-ethnographic and self-reflexive approach allowed 
me to show how a re-examination of digital knowledge creation can no 
longer be achieved from a distance, but only from the inside. Ultimately, 
the book demonstrated that it is only through the conscious awareness of 
the delusional belief in the neutrality of data, tools, methods, algorithms, 
infrastructures and processes that the biases embedded in these systems and 
amplified by their ubiquitous use can in fact be identified and addressed. 

In Chap. 3, for example, I showed how from pre-processing to data 
augmentation, the application of the post-authentic framework to the 
task of enriching digital material can guide each action of an enrichment 
workflow. Using the case examples of DeXTER and ChronicllItaly 3.0 
(Viola and Fiscarelli 2021a) and informed by symbiosis and mutualism, 
Chap. 3 illustrated how the post-authentic framework can guide the inter- 
action with the digital, not as a strategic (grant-oriented) or instrumen- 
tal (task-oriented) collaboration but as a cognitive mutual contribution. 
In particular, I unpacked the ambiguities and uncertainties of methods 
such as optical character recognition (OCR), named entity recognition 
(NER), geolocation and sentiment analysis (SA) and showed how the 
post-authentic framework can help address these challenges, for instance, 
through a thorough understanding of the assumptions behind these tech- 
niques, constant update and critical supervision. The framework recognises 
curatorial practices as manipulative interventions which especially in the 
case of cultural heritage material, bear the consequence of being a source 
of knowledge for current and future generations. 

This book was also a reflection on the implications of the digital trans- 
formation for our perception of the world. Drawing on the mathematical 
concepts of discrete vs continuous modelling of information (cfr. Chap. 4), 
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I discussed some of the repercussions of the transformation of continuous 
material into discrete form due to the discretisation of society, that is, 
binary sequences of Os and 1s, especially consequential for the notions of 
causality and correlations in relation to knowledge creation. In discrete 
systems, causality is hidden because information is discretisised into exact 
and separate points, which must be categorised and made explicit. As a 
result, we are given a digitally mediated image of the world, meaning 
that the relational causality of continuous information is replaced by 
predictions of correlations. Thus, societies in the digital in which the ‘big 
data philosophy’ reigns, I argued, are offered countless patterns but no 
explanations for them. Us—the digital citizens—are left to deal with a 
patterned, yet a-causal, way of making sense of reality. 

Closely related to this point is the use of metaphorical language to name 
computational techniques, such as topic modelling, sentiment analysis 
and machine learning (ML); this phenomenon can be seen as a way to 
make sense of an a-causal reality. Indeed conflating specific mathematical 
concepts such as discrete vs continuous modelling of information with 
such familiar notions has created reassuring expectations, that machines 
can learn to understand language and somewhat provide neutral, precise 
and understandable accounts from large quantities of textual material. In 
the case of SA, this altered image is that the subjectivity of human emotions 
can be reduced to two/three categories and quantified according to prob- 
abilistic calculations; in the case of ML, the unique, holistic human process 
of experiential learning and of connecting logic with contextual factors 
is discretisised into probabilities’ scores of huge, yet partial, quantities of 
discrete data; in the case of topic modelling, the text itself disappears and 
so does its continuous structure, i.e., the wider context that produced 
it. The computational dissembling of the causal structure by the dualistic 
system of Os and 1s hides the original continuous nature to which the data 
refers. The use of metaphorical language such as ‘sentiment’, ‘learning’ 
and ‘topic’, I argued, has therefore certainly contributed to make these 
methods extremely popular, especially outside their fields of origin, but at 
the same time, by obfuscating the precise mathematical laws upon which 
these techniques are based, it has created unrealistic beliefs. 

The post-authentic framework can be a useful tool to guide the unpack- 
ing of properties and assumptions of computational techniques used to 
analyse a digital object. Using topic modelling as an example, in Chap. 4, 
I showed how the framework can be applied to engage critically with 
software. At the core of the framework is the importance of maintaining 
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a close connection with the digital object; for example, in the chapter, 
I stressed how aspects such as pre-processing, corpus preparation and 
choosing the number of topics typically reputed as unproblematic are in 
fact fundamental moments within a topic modelling workflow in which 
the analyst is required to make countless choices. The example of topic 
modelling demonstrates how the post-authentic framework can guide the 
exploration, questioning and challenging of the interpretative potential of 
computation. 

Operating within the post-authentic framework crucially means 
acknowledging digital objects as living entities that have far-reaching, 
unpredictable consequences; the continually changing complexity of 
nets involving processes and actors must therefore always be critically 
supervised. The visualisation of a digital object is one such process. 
The post-authentic framework opposes an uncritical adoption of digital 
methods and points to the intrinsic dynamic, situated, interpreted and 
partial nature of the digital. Despite being often employed as exact ways 
of presenting reality, visualisations are extremely ambiguous techniques 
which embed numerous human decisions and judgement calls. In 
Chap. 5, I illustrated how the post-authentic framework can be applied to 
visualisation by discussing two examples: efforts towards the development 
of a user interface (UI) for topic modelling and the design choices for 
developing the app DeXTER, the interactive visualisation interface to 
explore ChroniclItaly 3.0. I specifically centred my discussion on how 
the ambiguities and uncertainties of topic modelling, network analysis 
(NA) and SA can be encoded visually. A key notion of the post-authentic 
framework is the acknowledgement of curatorial practices as manipulative 
interventions and of how it is in fact through exposing the ambiguities and 
uncertainties that knowledge creation in the digital can be kept honest and 
accountable for current and future generations. 

Through the application of the post-authentic framework to these four 
case examples, the book aimed to show how an uncritical and naive 
approach to the use of computational methods is bound to reproduce 
the very opaque processes that the publicised algorithmic discourse claims 
to break, but more worryingly, it contributes to make society worse. The 
book was therefore also a contribution to working towards systemic change 
in knowledge creation practises and by extension, in society at large; it 
provided a new set of notions and methods that can be implemented when 
collecting, assessing, reviewing, enriching, analysing and visualising digital 
material. It is this more problematised notion of the digital conceptualised 
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in the framework that highlights how its transcending nature makes old 
dichotomies between digital knowledge creation and non-digital knowl- 
edge creation no longer relevant and in fact, harmful. 

The digitisation of society already well on its way before the COVID- 
19 pandemic but certainly brought to its non-reversible turning point 
by the 2020 health crisis has brought into sharper focus how the digital 
exacerbates existing fractures and disparities in society. Unable to deal 
adequately with the complexity of society and social change, the cur- 
rent model of knowledge creation urgently requires a re-theorisation. 
This book is therefore a wake-up call for understanding the digital as 
no longer contextual to knowledge creation and for recognising that a 
discipline compartmentalisation model sustains an anachronistic and not 
equipped way to encapsulate and explain society. All information is now 
digital and algorithms are more and more central nodes of knowledge 
and culture production with an increased capacity to shape society at 
large. As digital vs non-digital positions have entirely lost relevance, it 
has become increasingly futile to create ultra-specialised disciplines from 
other disciplines’ overlapping spaces or indeed to invest energy in trying 
to define those, such as in the case of DH; the digital transformation 
has magnified the inadequacy of a mono-perspective approach, legacy 
of a model of knowledge that compartmentalises competing disciplines. 
Scholars, researchers, universities and institutions must acknowledge the 
central role they have to play in assessing how knowledge is created not 
just today, but also for future generations. 

The new theoretical and methodological framework that I proposed in 
this book moves beyond the current static conceptualisation of knowledge 
production which praises interdisciplinarity but forces knowledge into rigid 
categories. To the contrary, the framework offered novel concepts and 
terminologies that break with dialectical principles of dualism and antago- 
nism, including dichotomous notions of digital vs non-digital, sciences vs 
the humanities, authentic vs non-authentic and computational/neutral vs 
non-computational/biased. The re-devised notions, practices and values 
that I offered help re-figure the way in which society conceptualises data, 
technology, digital objects and the process of knowledge creation in the 
digital. 

My re-examination of the current model of knowledge includes not just 
scholarship but pedagogy too. And whilst this is not the main focus of 
this book, the arguments I put forward here for scholarship equally apply 
to pedagogy. In order to achieve systemic change, academic programmes 
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must be updated to include opportunities for critical reflections on the 
pressing issues stemming from the ubiquitous underpinning of AI in our 
societies. Through real use cases similar to those illustrated throughout the 
chapters of this book, students would learn about the deep implications of 
digital technologies on contemporary culture and society. In the words 
of Timnit Gebru, the research scientist who was recently fired by Google 
after exposing how strongly biased Google’s AI systems are (Bender et al. 
2021), “The people creating the technology are a big part of the system. If 
many are actively excluded from its creation, this technology will benefit a 
few while harming a great many’. Indeed, as technology is a central locus 
of knowledge and culture production and AI technology in particular is 
dominated by a white, mostly male workforce, the culture that is produced 
replicates the biases of the almost entirely male, predominantly white 
workforce that is building it. 

Although there may not be any initial intention of using biased models, 
tech companies become immediately accountable, at least from an ethical 
perspective if not yet a legal one, as soon as they refuse to acknowledge 
and correct such biases even when these are clearly exposed. If it is true 
that governments are spectacularly behind in creating rules for the ethical 
use of this technology, it is equally true that big tech companies shouldn’t 
wait for laws to be passed. Because of the serious social repercussions of 
the technology they create, they have a responsibility to bring this issue 
at the centre of their organisations. Meanwhile, universities also have a 
responsibility to train in ethical digital management the next generation 
of thinkers, scholars and academics as well as of digital citizens at large. 
Equally, research funding agencies must specifically require that the issue 
of digital ethics is explicitly addressed by researchers in their projects, for 
instance, by demanding a digital critical component in their proposals. As 
users and co-producers of technology, our responsibility is to counterbal- 
ance the main AI discourse with new, more honest narratives, to critically 
reflect on how we are producing knowledge today and for tomorrow, and 
on how we educate the next generation of students and digital citizens 
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to be like. The post-authentic framework of knowledge creation in the 
digital provides a framework to communicate and incorporate values of 
honesty, accountability, transparency and sustainability into knowledge. It 
reminds us that a racist, sexist, homophobic digital society is not so much 
a reflection of human subjectivity in data and algorithms but proof of its 
pretend absence. 
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